# Data Wrangling With Pandas

## Task One - Series
In the cell below, create a `pandas` Series that contains the populations of the top ten British cities. You can find the necessary data in `data/urban.md`. Print out the series.

In [40]:
import pandas as pd

In [41]:
populations = pd.Series([10803000, 2517000, 2449000, 1659000, 1100000, 835000, 805000, 719000, 719000, 603000], index=["London", "Birmingham", "Manchester", "Leeds-Bradford", "Glasgow", "Liverpool", "Southampton-Portsmouth", "Newcastle", "Nottingham", "Sheffield"])

populations

London                    10803000
Birmingham                 2517000
Manchester                 2449000
Leeds-Bradford             1659000
Glasgow                    1100000
Liverpool                   835000
Southampton-Portsmouth      805000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
dtype: int64

Sort the series by alphabetical order creating a new series, and print it out.

In [42]:
sorted = populations.sort_index()

sorted

Birmingham                 2517000
Glasgow                    1100000
Leeds-Bradford             1659000
Liverpool                   835000
London                    10803000
Manchester                 2449000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
Southampton-Portsmouth      805000
dtype: int64

Create and print a series consisting of the second half of the alphabetically-sorted series.

In [43]:
second_half_sorted = sorted.tail()

second_half_sorted

Manchester                2449000
Newcastle                  719000
Nottingham                 719000
Sheffield                  603000
Southampton-Portsmouth     805000
dtype: int64

## Task Two - Dataframes

In the cell below, create three new Series for
- the top attractions in each city
- whether the city is a port
- the geographical area of the city in square km

From all the series, create a DataFrame than contains all the information about the cities. Include a unique id for each city. Print out the resulting dataframe.

In [44]:
attractions_series = pd.Series(["Big Ben", "Cadbury World", "Northcoders", "Royal Armouries", "Kelvingrove", "Albert Dock", "Naval Dockyard", "Bigg Market", "Nottingham castle", "Botanical Gardens"], index=["London", "Birmingham", "Manchester", "Leeds-Bradford", "Glasgow", "Liverpool", "Southampton-Portsmouth", "Newcastle", "Nottingham", "Sheffield"])

attractions_series

London                              Big Ben
Birmingham                    Cadbury World
Manchester                      Northcoders
Leeds-Bradford              Royal Armouries
Glasgow                         Kelvingrove
Liverpool                       Albert Dock
Southampton-Portsmouth       Naval Dockyard
Newcastle                       Bigg Market
Nottingham                Nottingham castle
Sheffield                 Botanical Gardens
dtype: object

In [45]:
is_port_series = pd.Series(["Yes", "No", "No", "No", "Yes", "Yes", "Yes", "Yes", "No", "No"], index=["London", "Birmingham", "Manchester", "Leeds-Bradford", "Glasgow", "Liverpool", "Southampton-Portsmouth", "Newcastle", "Nottingham", "Sheffield"])

is_port_series

London                    Yes
Birmingham                 No
Manchester                 No
Leeds-Bradford             No
Glasgow                   Yes
Liverpool                 Yes
Southampton-Portsmouth    Yes
Newcastle                 Yes
Nottingham                 No
Sheffield                  No
dtype: object

In [46]:
geo_area_series = pd.Series([1737.9, 598.9, 630.3, 487.8, 368.5, 199.6, 192, 182.0, 176.4, 167.5], index=["London", "Birmingham", "Manchester", "Leeds-Bradford", "Glasgow", "Liverpool", "Southampton-Portsmouth", "Newcastle", "Nottingham", "Sheffield"])

geo_area_series

London                    1737.9
Birmingham                 598.9
Manchester                 630.3
Leeds-Bradford             487.8
Glasgow                    368.5
Liverpool                  199.6
Southampton-Portsmouth     192.0
Newcastle                  182.0
Nottingham                 176.4
Sheffield                  167.5
dtype: float64

In [47]:
city_names = ["London", "Birmingham", "Manchester", "Leeds-Bradford", "Glasgow", "Liverpool", "Southampton-Portsmouth", "Newcastle", "Nottingham", "Sheffield"]

cities = pd.DataFrame({"top_attraction": attractions_series, "is_port": is_port_series, "geographical_area (sq km)": geo_area_series})

cities = cities.reset_index(names=["city_name"])

cities

Unnamed: 0,city_name,top_attraction,is_port,geographical_area (sq km)
0,London,Big Ben,Yes,1737.9
1,Birmingham,Cadbury World,No,598.9
2,Manchester,Northcoders,No,630.3
3,Leeds-Bradford,Royal Armouries,No,487.8
4,Glasgow,Kelvingrove,Yes,368.5
5,Liverpool,Albert Dock,Yes,199.6
6,Southampton-Portsmouth,Naval Dockyard,Yes,192.0
7,Newcastle,Bigg Market,Yes,182.0
8,Nottingham,Nottingham castle,No,176.4
9,Sheffield,Botanical Gardens,No,167.5


Create and print a dataframe containing the top attractions in cities that are ports. Include only the city id and name.

In [48]:
ports = cities.loc[cities["is_port"] == "Yes", ["city_name", "top_attraction"]]

ports

Unnamed: 0,city_name,top_attraction
0,London,Big Ben
4,Glasgow,Kelvingrove
5,Liverpool,Albert Dock
6,Southampton-Portsmouth,Naval Dockyard
7,Newcastle,Bigg Market


Create a dataframe which includes all the original data plus the population densities (population per square km) for each city and order them from low density to high density.

In [52]:
pops = [10803000, 2517000, 2449000, 1659000, 1100000, 835000, 805000, 719000, 719000, 603000]

cities["population_density (pop / sq km)"] = pops / cities["geographical_area (sq km)"]

cities_by_pop_dens = cities.sort_values(by=["population_density (pop / sq km)"])

cities_by_pop_dens

Unnamed: 0,city_name,top_attraction,is_port,geographical_area (sq km),population_density (pop / sq km)
4,Glasgow,Kelvingrove,Yes,368.5,2985.074627
3,Leeds-Bradford,Royal Armouries,No,487.8,3400.98401
9,Sheffield,Botanical Gardens,No,167.5,3600.0
2,Manchester,Northcoders,No,630.3,3885.451372
7,Newcastle,Bigg Market,Yes,182.0,3950.549451
8,Nottingham,Nottingham castle,No,176.4,4075.963719
5,Liverpool,Albert Dock,Yes,199.6,4183.366733
6,Southampton-Portsmouth,Naval Dockyard,Yes,192.0,4192.708333
1,Birmingham,Cadbury World,No,598.9,4202.704959
0,London,Big Ben,Yes,1737.9,6216.122907
