# Introduction to Pandas

Q: What is pandas? 🐼

Q: Why do we use pandas?

Q: What built-in Python data structures do you know?

### 1. Getting Started
- We first import pandas and load a table into a DataFrame.

In [2]:
import pandas as pd

In [7]:
penguins=pd.read_csv("penguins_simple.csv",sep=";")

### 2. Working with DataFrames
- To view the contents of a data frame, type its name.

### 3. Examining DataFrames
Match the Python commands with the descriptions below. 

*In Jupyter, you can drag and drop cells up or down — hover just to the left of the cell to try.*

In [8]:
penguins.head(3)

Unnamed: 0,Species,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex
0,Adelie,39.1,18.7,181.0,3750.0,MALE
1,Adelie,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,40.3,18.0,195.0,3250.0,FEMALE


In [9]:
penguins.tail(3)

Unnamed: 0,Species,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex
330,Gentoo,50.4,15.7,222.0,5750.0,MALE
331,Gentoo,45.2,14.8,212.0,5200.0,FEMALE
332,Gentoo,49.9,16.1,213.0,5400.0,MALE


In [10]:
penguins.describe()

Unnamed: 0,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g)
count,333.0,333.0,333.0,333.0
mean,43.992793,17.164865,200.966967,4207.057057
std,5.468668,1.969235,14.015765,805.215802
min,32.1,13.1,172.0,2700.0
25%,39.5,15.6,190.0,3550.0
50%,44.5,17.3,197.0,4050.0
75%,48.6,18.7,213.0,4775.0
max,59.6,21.5,231.0,6300.0


In [11]:
penguins['Culmen Length (mm)'].mean()

43.992792792792805

In [12]:
penguins['Species'].value_counts()

Adelie       146
Gentoo       119
Chinstrap     68
Name: Species, dtype: int64

In [13]:
penguins.shape  

(333, 6)

In [14]:
penguins['Species'].unique()

array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)

In [15]:
penguins['Body Mass (g)'] / 100

0      37.50
1      38.00
2      32.50
3      34.50
4      36.50
       ...  
328    49.25
329    48.50
330    57.50
331    52.00
332    54.00
Name: Body Mass (g), Length: 333, dtype: float64

**Number of rows and columns**

**Mean of a column**

**Show the last 3 lines**

**Summarize categorical data**

**Summarize all numerical columns**

**Show the first 3 lines**

**Apply a calculation to each value in a column**

**Extract distinct values**

### 4. Selecting rows and columns
Match the Python commands with the descriptions below. 

In [16]:
penguins.columns

Index(['Species', 'Culmen Length (mm)', 'Culmen Depth (mm)',
       'Flipper Length (mm)', 'Body Mass (g)', 'Sex'],
      dtype='object')

In [17]:
penguins.index

RangeIndex(start=0, stop=333, step=1)

In [18]:
penguins['Flipper Length (mm)']

0      181.0
1      186.0
2      195.0
3      193.0
4      190.0
       ...  
328    214.0
329    215.0
330    222.0
331    212.0
332    213.0
Name: Flipper Length (mm), Length: 333, dtype: float64

In [None]:
penguins[['Flipper Length (mm)', 'Body Mass (g)']]

In [19]:
penguins.iloc[3:7]

Unnamed: 0,Species,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex
3,Adelie,36.7,19.3,193.0,3450.0,FEMALE
4,Adelie,39.3,20.6,190.0,3650.0,MALE
5,Adelie,38.9,17.8,181.0,3625.0,FEMALE
6,Adelie,39.2,19.6,195.0,4675.0,MALE


In [20]:
penguins[penguins['Body Mass (g)'] > 5000]

Unnamed: 0,Species,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex
215,Gentoo,50.0,16.3,230.0,5700.0,MALE
217,Gentoo,50.0,15.2,218.0,5700.0,MALE
218,Gentoo,47.6,14.5,215.0,5400.0,MALE
221,Gentoo,46.7,15.3,219.0,5200.0,MALE
223,Gentoo,46.8,15.4,215.0,5150.0,MALE
...,...,...,...,...,...,...
326,Gentoo,55.1,16.0,230.0,5850.0,MALE
327,Gentoo,48.8,16.2,222.0,6000.0,MALE
330,Gentoo,50.4,15.7,222.0,5750.0,MALE
331,Gentoo,45.2,14.8,212.0,5200.0,FEMALE


In [None]:
penguins.values

**Extract raw data as a NumPy array**

**Select rows by slicing the index**

**Filter rows by a condition**

**Display column labels**

**Select multiple columns**

**Display row index**

**Select one column**

### 5. Summarizing Data
Match the Python commands with the descriptions below. 

In [None]:
penguins['Body Mass (g)'].cumsum()

In [None]:
penguins.groupby('Sex')['Body Mass (g)'].sum()

In [None]:
penguins.sort_values(by=['Species', 'Body Mass (g)'])

In [None]:
def get_initial(s):
    return s[0]

penguins['initial'] = penguins['Species'].apply(get_initial)
penguins

In [None]:
penguins.stack()

In [None]:
penguins.transpose()

In [None]:
penguins['Body Mass (g)'].hist()

In [None]:
penguins.plot('Culmen Depth (mm)', 'Culmen Length (mm)' , style='ro')

**Draw a scatterplot**

**Move columns to a new index level**

**Create a new column using a function**

**Draw a histogram**

**Cumulatively apply a sum over a column**

**Swap rows and columns**

**Calculate sum of one column grouped by a second one**

**Sort values**

### 6. Writing to Disk