# A Brief Introduction to Pandas -- Exercises

## Goal

Practice `pandas` basics and working with `DataFrames`

## Exercises

### 1. How do you import the `pandas` package (use community standards)?

In [1]:
import pandas as pd

### 2. One way to make a `DataFrame` is to create one from a Python dictionary.  Create a dictionary that mirrors the table below, and use this to create a `pandas` `DataFrame`.

| last_name  | first_name | age | major       | graduation_year |
| ---        | ---        | --- | ---         | ---             |
| Snow       | John       | 23  | English     | 2025            |
| Targaryen  | Daenerys   | 23  | Mathematics | 2024            |
| Lannister  | Tyrion     | 39  | Economics   | 2024            |
| Stark      | Arya       | 18  | History     | 2023            |

In [2]:
myDict = {"last_name": ["Snow", "Targaryen", "Lannister", "Stark"], \
          "first_name": ["John", "Daenerys", "Tyrion", "Arya"], \
          "age": [23, 23, 39, 18], \
          "major": ["English", "Mathematics", "Economics", "History"], \
          "graduation_year": [2025, 2024, 2024, 2023]}

myDF = pd.DataFrame(myDict)
myDF

Unnamed: 0,last_name,first_name,age,major,graduation_year
0,Snow,John,23,English,2025
1,Targaryen,Daenerys,23,Mathematics,2024
2,Lannister,Tyrion,39,Economics,2024
3,Stark,Arya,18,History,2023


### 3. What are the `pandas` functions to get basic information about the `DataFrame` above?
- Number of rows and columns (single command)
- Number of rows and columns, individually
- The names of the columns
- The data types for each of the columns
- Overall summary of the `DataFrame` (using one command)

In [3]:
# shape gives both the rows and columns as a tuple
myDF.shape

# You can get rows and columns individually in several ways
[rows, cols] = myDF.shape
rows_again = myDF.shape[0]
cols_again = myDF.shape[1]

# Column names
myDF.columns
list(myDF.columns)

# Data types of the columns
myDF.dtypes

# Overall summary
myDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
last_name          4 non-null object
first_name         4 non-null object
age                4 non-null int64
major              4 non-null object
graduation_year    4 non-null int64
dtypes: int64(2), object(3)
memory usage: 240.0+ bytes


### 4. Compute the sum of the `age` column using `pandas` operations on the `DataFrame` (don't calculate directly)

In [4]:
# First index the column by name, then compute the sum
myDF["age"].sum()

103

### 5. Find the (direct) link to a .csv file online and use `pandas` to read this data into Python

Try Google searching for a topic that might have open data on the web (e.g. weather)

If you're really stuck try looking around here:
https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php

In [5]:
csv_url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_hour.csv"
myDat = pd.read_csv(csv_url)
myDat

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-07-28T04:07:10.984Z,62.8816,-148.1778,61.5,1.0,ml,,,,0.75,...,2020-07-28T04:12:40.122Z,"68 km SE of Cantwell, Alaska",earthquake,,1.0,,,automatic,ak,ak
1,2020-07-28T04:00:17.743Z,61.3825,-151.3207,75.0,1.9,ml,,,,0.45,...,2020-07-28T04:08:58.903Z,"29 km NNW of Beluga, Alaska",earthquake,,0.5,,,automatic,ak,ak
2,2020-07-28T03:50:42.508Z,59.9603,-147.8642,26.4,1.8,ml,,,,0.91,...,2020-07-28T03:59:36.127Z,"14 km SE of Chenega, Alaska",earthquake,,0.3,,,automatic,ak,ak
3,2020-07-28T03:48:59.359Z,36.296898,-98.183731,6.240216,1.5,ml,20.0,90.949974,0.262981,0.628814,...,2020-07-28T03:51:31.215Z,"5 km N of Ames, Oklahoma",earthquake,3.154718,4.113808,,15.0,automatic,ok,ok
