# Introduction to Pandas: A Short Tutorial

* Pandas is an open source, BSD-licensed library
* High-performance, easy-to-use data structures and data analysis tools
* Built on top of NumPy, and provides an efficient implementation of a DataFrame
* Makes data analysis fast and easy in Python

**DataFrame**: A multidimensional array with attached row and column labels

### **Important Information**

(1) To answer these exercises, you **must first read Chapter 2: Dapa Manipulation with Pandas from the Python Data Science Handbook** (https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)


**Pandas API Reference**: https://pandas.pydata.org/pandas-docs/stable/reference/index.html

#### **This is an introduction to Pandas. You can try to complete the tutorial yourself and check the model answers for help, when needed**

In [None]:
# Import Pandas
import pandas as pd

In [None]:
# Create two lists with information from Baby names in England and Wales: 2018
# https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/
names  = ['Adam', 'Sophie', 'Charlie', 'Anna', 'Bobby', 'Florence', 'George', 'Mia']
births = [1508,   1929,      3336,     409,     652,    1974,        4949,    2418]

# Merge these two lists together using the zip function
# https://docs.python.org/3.3/library/functions.html
babiesDataSet = list(zip(names, births))

In [None]:
# Print the combined list
babiesDataSet

In [None]:
# Use Pandas to create a dataframe
df = pd.DataFrame(data=babiesDataSet, columns=['Name', "Births"])

In [None]:
# Display the dataframe
df

In [None]:
# Export the dataframe to csv
# You can find the CSV file in the same directory with this Jupyter Notebook
# (If you are using Google Colaboratory this should be in the virtual directory)
df.to_csv("birthsUK2018.csv", index=False, header=False)

In [None]:
# Import data to dataframe
file = "birthsUK2018.csv" #location is relative
births =  pd.read_csv(file, header=None, names=['Name', 'Births'])

In [None]:
# Show the dataframe
# The numbers [0,1,2,3,4] in the first column are part of the index of the dataframe. 
births

In [None]:
# Check the data types of columns
births.dtypes

In [None]:
# Get general info about the dataframe
# - There are 8 records in the data set
# - There is a column named "Name" of type object (non numeric) with 8 values
# - There is a column named "Births" of type numeric with 8 values
births.info()

In [None]:
# Check the data types of Births column
births.Births.dtype

In [None]:
# Print the top 5 rows
# Write your code here


In [None]:
# Print the last 3 rows
# Write your code here


In [None]:
# Print the name of columns
# Write your code here


In [None]:
# Transfrom the datafram into an array
# Write your code here


In [None]:
# Get the index of the dataframe
# Write your code here


In [None]:
# Access the entries of the Name column (using dataframe slicing)
# Write your code here


In [None]:
# Access the entries of the Births column as a property
# Write your code here


In [None]:
# Find the maximum number of births
# Write your code here


In [None]:
# Get the name associated with the max births
# Write your code here


In [None]:
# Find the unique names
# Write your code here


In [None]:
# Get some descriptive statistics for the number of births
# Write your code here


In [None]:
# Get the names with births more than 2000
# Write your code here


In [None]:
# Get the names starting with "A"
# Write your code here


In [None]:
# Add another column with the country set for all rows as UK
# Tip: Use numpy function repeat (if needed)
# Write your code here
import numpy as np


In [None]:
# Add a column with the gender of the babies
# Assume the genders are alternating "Male", "Female"
# e.g., Adam -> Male, Sophie -> Female, Charlie -> Male, Anna -> Female etc
# Tip: Use the NumPy function tile if needed
# Write your code here


In [None]:
# Add a column that indicates for each name its percentage over the total births
# Write your code here


In [None]:
# Delete the country column
# Write your code here


In [None]:
# Create and print to a new dataframe only columns Name, Births, and Percentage
# Write your code here


In [None]:
# Subset the data based on index location to get the first 3 records
# Tip: Use the iloc property of the dataframe
# Write your code here


In [None]:
# Get the names that belong to female babies using the query function of a dataframe
# Tip: Use the 'at' property of a dataframe
# Write your code here


In [None]:
# Get the names that belong to female babies by slicing 
# Write your code here


In [None]:
# Get the names whose births are below 1000 and are male using the query function of a dataframe
# Write your code here


In [None]:
# Get the names whose births are below 1000 and are male by slicing
# Write your code here


In [None]:
# Get the number of births groupped by gender
# Write your code here


In [None]:
# Sort the dataframe by name
# Write your code here


In [None]:
# Sort the dataframe by the number of births in descending order
# Write your code here


In [None]:
# Add a column with the county each child has been born in as given by the list below
county = ['Yorkshire', 'Essex', 'Yorkshire', 'Yorkshire', 'Kent', 'Kent', 'Yorkshire', 'Essex']
# Write your code here


In [None]:
# Group the data based on Gender and then by County
# Write your code here
