# Data Wrangling With Pandas

## Task One - Series
In the cell below, create a `pandas` Series that contains the populations of the top ten British cities. You can find the necessary data in `data/urban.md`. Print out the series.

In [3]:
import pandas as pd

cities = pd.Series(
    {
        "London": 10803000,
        "Birmingham": 2517000,
        "Manchester": 2449000,
        "Leeds-Bradford": 1659000,
        "Glasgow": 1100000,
        "Liverpool": 835000,
        "Southampton-Portsmouth": 805000,
        "Newcastle": 719000,
        "Nottingham": 719000,
        "Sheffield": 603000,
    }
)

cities

London                    10803000
Birmingham                 2517000
Manchester                 2449000
Leeds-Bradford             1659000
Glasgow                    1100000
Liverpool                   835000
Southampton-Portsmouth      805000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
dtype: int64

Sort the series by alphabetical order creating a new series, and print it out.

In [9]:
alphabetised_cities = cities.sort_index()
alphabetised_cities

Birmingham                 2517000
Glasgow                    1100000
Leeds-Bradford             1659000
Liverpool                   835000
London                    10803000
Manchester                 2449000
Newcastle                   719000
Nottingham                  719000
Sheffield                   603000
Southampton-Portsmouth      805000
dtype: int64

Create and print a series consisting of the second half of the alphabetically-sorted series. 

In [15]:
# alphabetised_cities['Manchester':'Southampton-Portsmouth']
alphabetised_cities[int(len(alphabetised_cities)/2):]


Manchester                2449000
Newcastle                  719000
Nottingham                 719000
Sheffield                  603000
Southampton-Portsmouth     805000
dtype: int64

## Task Two - Dataframes

In the cell below, create two new Series for
- the top attractions in each city
- whether the city is a port
- the geographical area of the city in square km

From all the series, create a DataFrame than contains all the information about the cities. Include a unique id for each city. Print out the resulting dataframe.

In [116]:
attractions = [
    "Big Ben",
    "Cadbury World",
    "Northcoders",
    "Royal Armouries",
    "Kelvingrove",
    "Albert Dock",
    "Naval Dockyard",
    "Bigg Market",
    "Nottingham Castle",
    "Botanical Gardens",
]
top_attraction = pd.Series(attractions, index=cities.index)
top_attraction

ports = [
    "Yes",
    "No",
    "No",
    "No",
    "Yes",
    "Yes",
    "Yes",
    "Yes",
    "No",
    "No",
]
port = pd.Series(ports, index=cities.index)
print(port)

areas = [
    1737.9,
    598.9,
    630.3,
    487.8,
    368.5,
    199.6,
    192,
    180.0,
    176.4,
    167.5,
]
area = pd.Series(areas, index=cities.index)
area

cities_data = pd.DataFrame({"attraction": top_attraction, "port": port, "area": area})
# cities_data = pd.DataFrame({'port':port})

London                    Yes
Birmingham                 No
Manchester                 No
Leeds-Bradford             No
Glasgow                   Yes
Liverpool                 Yes
Southampton-Portsmouth    Yes
Newcastle                 Yes
Nottingham                 No
Sheffield                  No
dtype: object


Create and print a dataframe containing the top attractions in cities that are ports. Include only the city id and name.

In [117]:
# cities_data[cities_data['port'] == 'Yes']
print(cities_data)
cities_data['id'] = list(range(len(cities_data)))
# cities_data.loc[:, cities_data.columns != 'area']
# cities_data.loc[:, ~cities_data.columns.isin(['port','area'])]
# cities_data[['attraction','id']]
cities_data.loc[cities_data['port'] == 'Yes' ,['attraction','id']]

                               attraction port    area
London                            Big Ben  Yes  1737.9
Birmingham                  Cadbury World   No   598.9
Manchester                    Northcoders   No   630.3
Leeds-Bradford            Royal Armouries   No   487.8
Glasgow                       Kelvingrove  Yes   368.5
Liverpool                     Albert Dock  Yes   199.6
Southampton-Portsmouth     Naval Dockyard  Yes   192.0
Newcastle                     Bigg Market  Yes   180.0
Nottingham              Nottingham Castle   No   176.4
Sheffield               Botanical Gardens   No   167.5


Unnamed: 0,attraction,id
London,Big Ben,0
Glasgow,Kelvingrove,4
Liverpool,Albert Dock,5
Southampton-Portsmouth,Naval Dockyard,6
Newcastle,Bigg Market,7


Create a dataframe which includes all the original data plus the population densities (population per square km) for each city and order them from low density to high density.

In [119]:
cities = pd.Series(
    {
        "London": 10803000,
        "Birmingham": 2517000,
        "Manchester": 2449000,
        "Leeds-Bradford": 1659000,
        "Glasgow": 1100000,
        "Liverpool": 835000,
        "Southampton-Portsmouth": 805000,
        "Newcastle": 719000,
        "Nottingham": 719000,
        "Sheffield": 603000,
    }
)

cities_data['population'] = cities
cities_data['densities'] = cities_data['population'] / cities_data['area']
cities_data.sort_values('densities')

Unnamed: 0,attraction,port,area,id,population,densities
Glasgow,Kelvingrove,Yes,368.5,4,1100000,2985.074627
Leeds-Bradford,Royal Armouries,No,487.8,3,1659000,3400.98401
Sheffield,Botanical Gardens,No,167.5,9,603000,3600.0
Manchester,Northcoders,No,630.3,2,2449000,3885.451372
Newcastle,Bigg Market,Yes,180.0,7,719000,3994.444444
Nottingham,Nottingham Castle,No,176.4,8,719000,4075.963719
Liverpool,Albert Dock,Yes,199.6,5,835000,4183.366733
Southampton-Portsmouth,Naval Dockyard,Yes,192.0,6,805000,4192.708333
Birmingham,Cadbury World,No,598.9,1,2517000,4202.704959
London,Big Ben,Yes,1737.9,0,10803000,6216.122907


In [13]:
doughnuts_df = pd.read_json("data/doughnuts.json", orient='records')

In [14]:
doughnuts_df

Unnamed: 0,doughnut_data
0,"{'doughnut_type': 'Choccy Delight', 'price': 1..."
1,"{'doughnut_type': 'Strawberry Haze', 'price': ..."
2,"{'doughnut_type': 'Sprinkly Bonanza', 'price':..."
3,"{'doughnut_type': 'Nutty Heaven', 'price': 1.2..."
4,"{'doughnut_type': 'Caramel Caress', 'price': 1..."
5,"{'doughnut_type': 'Delectable Delights', 'pric..."
6,"{'doughnut_type': 'Banana Bonanza', 'price': 1..."
7,"{'doughnut_type': 'Marshmallow Marsh', 'price'..."
8,"{'doughnut_type': 'Rocky Road', 'price': 2.22,..."
9,"{'doughnut_type': 'Biscoff Gourmet', 'price': ..."
