# Data Wrangling With Pandas

## Task One - Series
In the cell below, create a `pandas` Series that contains the populations of the top ten British cities. You can find the necessary data in `data/urban.md`. Print out the series.

In [18]:
import pandas as pd

city_population = {
    "London": 10803000,
    "Birmingham": 2517000,
    "Manchester": 2449000,
    "Leeds-Bradford": 1659000,
    "Glasgow": 1100000,
    "Liverpool": 835000,
    "Southampton-Portsmouth": 805000,
    "Newcastle": 719000,
    "Nottingham": 719000,
    "Sheffield": 603000
}
population = pd.Series(city_population, name= 'population')

Sort the series by alphabetical order creating a new series, and print it out.

In [19]:
pd.Series(city_population, name= 'population').sort_index()

Birmingham                 2517000
Glasgow                    1100000
Leeds-Bradford             1659000
Liverpool                   835000
London                    10803000
Manchester                 2449000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
Southampton-Portsmouth      805000
Name: population, dtype: int64

Create and print a series consisting of the second half of the alphabetically-sorted series.

In [20]:
pd.Series(city_population, name= 'population').sort_index()

Birmingham                 2517000
Glasgow                    1100000
Leeds-Bradford             1659000
Liverpool                   835000
London                    10803000
Manchester                 2449000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
Southampton-Portsmouth      805000
Name: population, dtype: int64

## Task Two - Dataframes

In the cell below, create two new Series for
- the top attractions in each city
- whether the city is a port
- the geographical area of the city in square km

From all the series, create a DataFrame than contains all the information about the cities. Include a unique id for each city. Print out the resulting dataframe.

In [21]:
data = {
    'London': 'Big Ben',
    'Birmingham': 'Cadbury World',
    'Manchester': 'Northcoders',
    'Leeds-Bradford': 'Royal Armouries',
    'Glasgow': 'Kelvingrove',
    'Liverpool': 'Albert Dock',
    'Southampton-Portsmouth': 'Naval Dockyard',
    'Newcastle': 'Bigg Market',
    'Nottingham': 'Nottingham Castle',
    'Sheffield': 'Botanical Gardens'
}
top_attractions = pd.Series(data, name='Top Attraction')

data = {
    'London': 'Yes',
    'Birmingham': 'No',
    'Manchester': 'No',
    'Leeds-Bradford': 'No',
    'Glasgow': 'Yes',
    'Liverpool': 'Yes',
    'Southampton-Portsmouth': 'Yes',
    'Newcastle': 'Yes',
    'Nottingham': 'No',
    'Sheffield': 'No'
}
is_port = pd.Series(data, name='Is a port?')

data = {
    'London': 1737.9,
    'Birmingham': 598.9,
    'Manchester': 630.3,
    'Leeds-Bradford': 487.8,
    'Glasgow': 368.5,
    'Liverpool': 199.6,
    'Southampton-Portsmouth': 192,
    'Newcastle': 180.0,
    'Nottingham': 176.4,
    'Sheffield': 167.5
}
area_series = pd.Series(data, name='Area (square km)')

df = pd.DataFrame({
    'City': top_attractions.index,
    'Population': population.values,
    'Top Attraction': top_attractions.values,
    'Is a port?': is_port.values,
    'Area (square km)': area_series.values
})

print(df)

                     City  Population     Top Attraction Is a port?  \
0                  London    10803000            Big Ben        Yes   
1              Birmingham     2517000      Cadbury World         No   
2              Manchester     2449000        Northcoders         No   
3          Leeds-Bradford     1659000    Royal Armouries         No   
4                 Glasgow     1100000        Kelvingrove        Yes   
5               Liverpool      835000        Albert Dock        Yes   
6  Southampton-Portsmouth      805000     Naval Dockyard        Yes   
7               Newcastle      719000        Bigg Market        Yes   
8              Nottingham      719000  Nottingham Castle         No   
9               Sheffield      603000  Botanical Gardens         No   

   Area (square km)  
0            1737.9  
1             598.9  
2             630.3  
3             487.8  
4             368.5  
5             199.6  
6             192.0  
7             180.0  
8             176.4 

Create and print a dataframe containing the top attractions in cities that are ports. Include only the city id and name.

In [22]:
query1 = df[df['Is a port?'] == 'Yes']
print(query1['Top Attraction'])

0           Big Ben
4       Kelvingrove
5       Albert Dock
6    Naval Dockyard
7       Bigg Market
Name: Top Attraction, dtype: object


Create a dataframe which includes all the original data plus the population densities (population per square km) for each city and order them from low density to high density.

In [23]:
df['Population Density (per square km)'] = df['Population'] / df['Area (square km)']
print(df.sort_values('Population Density (per square km)'))

                     City  Population     Top Attraction Is a port?  \
4                 Glasgow     1100000        Kelvingrove        Yes   
3          Leeds-Bradford     1659000    Royal Armouries         No   
9               Sheffield      603000  Botanical Gardens         No   
2              Manchester     2449000        Northcoders         No   
7               Newcastle      719000        Bigg Market        Yes   
8              Nottingham      719000  Nottingham Castle         No   
5               Liverpool      835000        Albert Dock        Yes   
6  Southampton-Portsmouth      805000     Naval Dockyard        Yes   
1              Birmingham     2517000      Cadbury World         No   
0                  London    10803000            Big Ben        Yes   

   Area (square km)  Population Density (per square km)  
4             368.5                         2985.074627  
3             487.8                         3400.984010  
9             167.5                         