## Dyson School of Design Engineering
### DE1-Computing -1, Assignment-1

**This is an open book assessment under exam conditions.**

**This assignment carries 40% of the final mark.**

This is a Python coding assignment. Please submit your answers as either:

* one Jupyter notebook file with code cells, **or** 
* a zip file of separate Python codes for each question. 

Note that you will get marks for your correct attempts even if the code gives errors for some lines.
An 'A' grade answer would have:

1. Compact code that runs without errors.
1. Detailed comments outlining the plan to solve each problem.
1. Suitable variable names with explanations.
1. Evidence of intermediate testing. i.e. print statements to check the values of variables.
1. A discussion on answers.


## Question 1
Write a Python code to represent the following information in a data structure. i.e. Lists, arrays, disctionaries.

|Country|Area in Square km|No. of National Languages|Population|
|---|---|---|---|
|Russia|17,075,400|1|143,895,551|
|Canada|9,984,670|2|37,279,811|
|USA|9,826,675|No official National language|329,536,482|
|China|9,598,094|1|1,420,062,022|
|Brazil|8,514,877|1|212,392,717|
|Australia|7,617,930|1|25,088,636|
|India|3,287,263|14|1,368,737,513|


### Summary comment:
Some students hard coded the solution to this problem. Those who scored high used variable.type() function to identify which country had a text entry for the number of official languages. One can of course edit the original data and use a `None` or `0`, but then you lose the information in the text string which can be useful in some other place. So, it is a good practice not to edit the data just to make the coding easier.

Another common feature of good answers was ensuring your references to the data were clear at all times. Using integer index values are not intuitive to someone reading your code. Dictionaries in this case was a good example of how to easily avoid this.

Model solution below, followed by alternatives.

In [1]:
# I will structure the data in a dictionary by county, this allows me to easily 
# look up each country's information as needed. Each country will be represented
# by a dictionary that will hold a variety of information.
#
# The variable name 'countries' also adheres to PEP8 standards, and clearly 
# indicated what the variable is storing.
#
# I use spacing to be able to clearly check my data structure to ensure I did
# not make any typing mistakes.

countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Now I will test that I can extract data correctly from the structure.
# I use a suitable variable name that indicates the data being stored.

india_languages = countries['India']['Languages']
print("India has 14 national languages. The data structure says it has {}.".format(india_languages))

India has 14 national languages. The data structure says it has 14.


An alternative answer below, adapted from a student's submission.

In [4]:
# Firstly, transfer the table information into lists for each column.

countries = ['Russia', 'Canada', 'USA', 'China', 'Brazil', 'Austrialia', 'India']
areas = [17075400, 9984670, 9826675, 9598094, 8514877, 7617930, 3287263]
languages = [1, 2, 'No official', 1, 1, 1, 14]
populations = [143895551, 37279811, 329536482, 1420062022, 212392717, 25088636, 1368737513]

# Store our lists in a dictionary, so the data is stored in one data structure.
# This step may not be completely necessary.

countries = {
    'country': countries,
    'area': areas,
    'language': languages,
    'population' : populations
}

# Now we can loop through the data to show that it is stored correctly.

for i in range(len(countries['country'])):
    print('Index {}:  {} has area of {} square km, with {} national language(s), and population of {} people'
          .format(i, countries['country'][i], countries['area'][i],
                  countries['language'][i], countries['population'][i]))

Index 0:  Russia has area of 17075400 square km, with 1 national language(s), and population of 143895551 people
Index 1:  Canada has area of 9984670 square km, with 2 national language(s), and population of 37279811 people
Index 2:  USA has area of 9826675 square km, with No official national language(s), and population of 329536482 people
Index 3:  China has area of 9598094 square km, with 1 national language(s), and population of 1420062022 people
Index 4:  Brazil has area of 8514877 square km, with 1 national language(s), and population of 212392717 people
Index 5:  Austrialia has area of 7617930 square km, with 1 national language(s), and population of 25088636 people
Index 6:  India has area of 3287263 square km, with 14 national language(s), and population of 1368737513 people


## Question 2
Extend the code in Question 1 above to calculate how large the area of China is, as a percentage of the total area of all countries.

### Summary comment: 

Those who used a dictionary found it easier to answer this question. Of course, some students had separate lists or tuples that worked. Remember though that tuples are immutable, meaning once they are created, the values cannot be edited or added to. This is not ideal for many data structures.

Fragmented data structures lead to messy codes when the computations become complex. Having all data in one structure makes it easier to perform computations on multiple columns of a data table. 

For students who did use lists, some did well to utilise the `sum()` function which can take a list as its argument. Additionally, some did particularly well to use the `index()` function in order to keep references in terms of clear strings (rather than unintuitive integers).

Some students got the answer calculated wrong simply by incorrect data entry. This didn’t lose marks much, but such mistakes lead to wrong interpretations of data. Frequent use of comments and print statements improved the quality of answers and thus increased marks.

Note that no modules needed importing for this task, as everything could be achieved using the built-in Python functions.

*Model answer below*

In [11]:
countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Q2
# To find the percentage of total land area, I will need the total land area, 
# and China's area.
#
# I will use a list comprehension to quickly create a list of all the areas from
# my data structure that is then passed to the sum() function. I use clear
# variable names that adhere to PEP8 standard.

total_area = sum([countries[country]['Area'] for country in countries.keys()])
print("Total area is {}".format(total_area))

# Since my data is stored in a dictionary of dictionaries, I can clearly index 
# the area of China. This I store in a variable and then use in the calculation
# to find the percentage. I'll print the result with some print formatting.

china_area = countries['China']['Area']
china_percent = (china_area / total_area) * 100
print("China area is {} which is {:.2f}% of the total area {}".format(china_area, china_percent, total_area))

Total area is 65904909
China area is 9598094 which is 14.56% of the total area 65904909


Alternative to the above compact solution is to use a fully fledged for-loop.

In [12]:
# Alternative code to above

countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

total_area = 0
for country in countries.keys():
    total_area += countries[country]['Area'] # add the latest country to the total area

print("Total area is {}".format(total_area))

china_area = countries['China']['Area']
china_percent = (china_area / total_area) * 100
print("China area is {} which is {:.2f}% of the total area {}".format(china_area, china_percent, total_area))

Total area is 65904909
China area is 9598094 which is 14.56% of the total area 65904909


Alternative solution using the lists approach.

In [13]:
countries = ['Russia', 'Canada', 'USA', 'China', 'Brazil', 'Austrialia', 'India']
areas = [17075400, 9984670, 9826675, 9598094, 8514877, 7617930, 3287263]
languages = [1, 2, 'No official', 1, 1, 1, 14]
populations = [143895551, 37279811, 329536482, 1420062022, 212392717, 25088636, 1368737513]

# Q2
# Since my data is stored in lists I will find the total area using the sum()
# function.

total_area = sum(areas)
print("Total area is {}".format(total_area))

# I will then extract the area just for China. I utilise the index() method for
# the countries list to make it clear how I am choosing the index representing
# China; I use this on the areas list.

china_index = countries.index('China')
china_area = areas[china_index]
china_percent = (china_area / total_area) * 100
print("China area is {} which is {:.2f}% of the total area {}".format(china_area, china_percent, total_area))

Total area is 65904909
China area is 9598094 which is 14.56% of the total area 65904909


## Question 3
Extend the code in Question 1 above to find the country with highest and lowest population density (population/land area). Display your values to 2 decimal places.

### Summary comment: 

Most students did this question well. Those who scored high demonstrated their knowledge of using numpy module efficiently (although the module wasn't required to solve this). 

Some used for loops to make the code compact. Of course decent usage of comments and print statements to show intermediate results even in a for loop demonstrated that you tested the code well. Very rarely, some students calculated the population density incorrectly. Those are semantic errors one should avoid. 

*The model answer for most compact/efficient code is below*

In [14]:
countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Q3
# For each country, I need to calculate the population density. 
# I will store the results back in my data structure as I will use it later.

for country in countries.keys():
    pop_density = countries[country]['Population'] / countries[country]['Area']
    print("Population density for {} is {:.2f}".format(country, pop_density))
    countries[country]['Pop. density'] = pop_density # I store the value in the data structure under a new key

# Now I find the countries with the min and max pop. density. I use the max 
# and the min population densities from each country as the 'key' to the max/min
# functions. That is I want the functions to return the country name, but
# compare using the population density.

pd_max_country = max(countries.keys(), key=lambda c: countries[c]['Pop. density'])
pd_min_country = min(countries.keys(), key=lambda c: countries[c]['Pop. density'])

print("Country with highest population density is {} at {:.2f}".format(pd_max_country, countries[pd_max_country]['Pop. density']))
print("Country with lowest population density is {} at {:.2f}".format(pd_min_country, countries[pd_min_country]['Pop. density']))

Population density for Russia is 8.43
Population density for Canada is 3.73
Population density for USA is 33.53
Population density for China is 147.95
Population density for Brazil is 24.94
Population density for Australia is 3.29
Population density for India is 416.38
Country with highest population density is India at 416.38
Country with lowest population density is Australia at 3.29


This next block is the same as above, just rewritten to be a little longer and easier to follow.

In [18]:
countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Q3
# For each country, I need to calculate the population density. 
# I will store the results back in my data structure as I will use it later.
# I will calculate and keep track of the max and min as we move through the loop.

max_pd = (0, 'country')
min_pd = (0, 'country')

for country in countries.keys():
    
    # Calculate the pop. den.
    pop_density = countries[country]['Population'] / countries[country]['Area']
    print("\nPopulation density for {} is {:.2f}".format(country, pop_density))
    
    # I store the value in the data structure under a new key
    countries[country]['Pop. density'] = pop_density 
    
    # Update the max if the new value is greater than it
    if pop_density > max_pd[0]:
        max_pd = (pop_density, country)
        print("The max Pop. Den. has been updated!")
    
    # Update the min if the new value is less than it, or if the current record
    # is 0 (because that's what we set it to before the loop)
    if pop_density < min_pd[0] or min_pd[0] == 0:
        min_pd = (pop_density, country)
        print("The min Pop. Den. has been updated!")

print()
print("Country with highest population density is {} at {:.2f}".format(max_pd[1], max_pd[0]))
print("Country with lowest population density is {} at {:.2f}".format(min_pd[1], min_pd[0]))


Population density for Russia is 8.43
The max Pop. Den. has been updated!
The min Pop. Den. has been updated!

Population density for Canada is 3.73
The min Pop. Den. has been updated!

Population density for USA is 33.53
The max Pop. Den. has been updated!

Population density for China is 147.95
The max Pop. Den. has been updated!

Population density for Brazil is 24.94

Population density for Australia is 3.29
The min Pop. Den. has been updated!

Population density for India is 416.38
The max Pop. Den. has been updated!

Country with highest population density is India at 416.38
Country with lowest population density is Australia at 3.29


## Question 4
Write a Python code to find the country with the highest number of official languages while ignoring any countries with undefined or non-numeric values. 

### Summary comment:

Some students hard coded the solution to this problem. Those who scored high used `type()` function to identify which country had a text entry for the number of official languages. One can of course edit the original data with a `None` or a `0`, but then you lose the information in the text string which can be useful in some other place. So, it is a good practice not to edit the data just to make coding it easier.

In [21]:
countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Q4
# I will loop through the countries and keep track of which has the highest 
# number of national languages as I go, but I will ignore values which are not 
# integers. Integers are the only valid metric.

max_languages = 0
max_lang_country = None

for country in countries.keys():
    nat_langs = countries[country]['Languages']
    if type(nat_langs) is int and nat_langs > max_languages:
        max_languages = nat_langs
        max_lang_country = country

print("Answer should be India")
print("Result returned {}. It has {} national languages.".format(max_lang_country, max_languages))

Answer should be India
Result returned India with 14 national languages


## Question 5
**Q5.1** Write a function that takes in two country names from the list of countries in the above table as arguments and gives out a decision as to whether the first country has a bigger population than the second country.

**Q5.2** Write a while loop that asks the user to enter two country names and calls the function from Q5.1 until the user wants to quit.

### Summary comment:

Here, those who scored high correctly coded the function, and used compact looping methods like while - true loops to go through data, used boolean variable correctly to compare values, and used if-then conditions correctly. They also commented each line and used print statements to display intermediate variables.

In [23]:
countries = {
    'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
    'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
    'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
    'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
    'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
    'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
    'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
}

# Q5
# First I write a function that compares the populations of each country. As 
# per the question it will return a boolean about whether the first is larger 
# than the second. It must also take the data structure as an argument

def larger_pop(data, larger_pop, smaller_pop):
    is_larger = data[larger_pop]['Population'] > data[smaller_pop]['Population']
    return is_larger # I return the boolean result

# I will now test the function. First I print some values out so I can compare
# them myself manually.

print("India", countries['India']['Population'])
print("USA", countries['USA']['Population'])
print("Canada", countries['Canada']['Population'])

# Now I use the function to print the comparison of some countries.

print("India has more than USA? {}".format(larger_pop(countries, 'India', 'USA')))
print("Canada has more than USA? {}".format(larger_pop(countries, 'Canada', 'USA')))

# Now I write a while loop which will run until the user wants to quit.
# The user can quit by entering 'q' or 'quit'. It will not be case sensitive.

response = [""] # keeps track of the user's input as list of words

# I use the .lower() method to convert the variable string into the lower case.
# That way the checks are not case sensitive (i.e. 'Q' and 'q' will quit).
while response[0].lower() not in ['q', 'quit']:
    
    print("\nPlease enter two country names seperated by a space (e.g. 'India USA')")
    response = input("Countries: ").split()
    
    if len(response) == 2 and response[0] in countries.keys() and response[1] in countries.keys():
        print("Trying to see if population {0[0]} is greater than {0[1]}".format(response))
        answer = "YES" if larger_pop(countries, response[0], response[1]) else "NO"
        print("The answer is {}".format(answer))   
        
print("Done")

India 1368737513
USA 329536482
Canada 37279811
India has more than USA? True
Canada has more than USA? False

Please enter two country names seperated by a space (e.g. 'India USA')
Countries: India

Please enter two country names seperated by a space (e.g. 'India USA')
Countries: USA

Please enter two country names seperated by a space (e.g. 'India USA')
Countries: India USA
Trying to see if population India is greater than USA
The answer is YES

Please enter two country names seperated by a space (e.g. 'India USA')
Countries: q
Done


In [26]:
countries = {
        'Russia':   {'Area':17075400, 'Languages':1,             'Population':143895551},
        'Canada':   {'Area':9984670,  'Languages':2,             'Population':37279811},
        'USA':      {'Area':9826675,  'Languages':'No official', 'Population':329536482},
        'China':    {'Area':9598094,  'Languages':1,             'Population':1420062022},
        'Brazil':   {'Area':8514877,  'Languages':1,             'Population':212392717},
        'Australia':{'Area':7617930,  'Languages':1,             'Population':25088636},
        'India':    {'Area':3287263,  'Languages':14,            'Population':1368737513}
    }

# Alternative method
# Q5.2

def compare_population(a, b, countries):
    decison = countries[a]['Population'] > countries[b]['Population']
    if decison:
        text = a+' has a bigger population than '+b
    else:
        text = a+' does not have a bigger population than '+b
    return text


# Q5.1
print('Answer to question 5.1:')
list_countries = list(countries.keys())
print('The list of countries: {}'.format(list_countries))

while True:
    countryA = input('Please enter the name of your first country: ')
    countryB = input('Please enter the name of your second country: ')

    text_decision = compare_population(countryA, countryB, countries)
    print(text_decision)
    
    response = input('Do you want to continue? y or n? ')
    if response == 'n' or response == 'N':
        break # this will exit the loop

Answer to question 5.1:
The list of countries: ['Russia', 'Canada', 'USA', 'China', 'Brazil', 'Australia', 'India']
Please enter the name of your first country: Russia
Please enter the name of your second country: USA
Russia does not have a bigger population than USA
Do you want to continue? y or n? n


## Question 6
$$ y = x^3 + 2 x^2 + x + 3 $$

6.1 Write a symbolic Python code to differentiate the above equation with respect to x.
6.2 Hence find the $x$ values at which the slope of y will become zero.

### Summary comment:

Most students did this question well. Those who scored high, imported only the required functions from sympy module, used comments and print statements to guide the reader.

In [36]:
# Q6.1
# First only import the functions needed from the sympy module
from sympy import symbols, solve, diff

# Then we declare the symbolic variables
x, y = symbols('x y')

# Now define the equation for y in terms of x
y = x**3 + 2*x**2 + x + 3

# Use the diff() function to differentiate y in terms of x, 1 time
dy = diff(y, x, 1)

# Show the results
print('Function: y={}'.format(y))
print('First derivative of y with respect to x is: dy/dx={}'.format(dy))

# Q6.2
# Now we use solve() to find the roots of the derived equation. The roots are
# the values of x where the equation equates to 0.
roots = solve(dy, x)

# Show the results
print('The values of x at which the slope of y becomes zero are: {}'.format(roots))
print("There are {} solutions to dy/dx = 0".format(len(roots)))

# We will write the roots out with more information
for i in range(len(roots)):
    print('At x = {:.2f}, dy = {:.2f} (values to 2 d.p.)'
          .format(roots[i].evalf(), float( dy.subs([(x, roots[i])]) ) ))

Function: y=x**3 + 2*x**2 + x + 3
First derivative of y with respect to x is: dy/dx=3*x**2 + 4*x + 1
The values of x at which the slope of y becomes zero are: [-1, -1/3]
There are 2 solutions to dy/dx = 0
At x = -1.00, dy = 0.00 (values to 2 d.p.)
At x = -0.33, dy = 0.00 (values to 2 d.p.)
