Copyright (c) 2025 aamirmd. All Rights Reserved.

This work is licensed under the MIT License. See LICENSE file for details.

# Pandas tutorial

Welcome!

Pre-requisites:
- Basic python (variables, types, lists, indexing, dicts, etc.) 
- Numpy (preferred)

##  Installing and importing Pandas

Please view official documentation here: https://pandas.pydata.org/

In [None]:
!pip install pandas

This notebook can be run on google colab without needing to install python on your computer.

Please follow instructions here: [Google Colab Instructions](https://medium.com/@jessica0greene/running-your-notebooks-in-the-cloud-with-google-colab-4387529bfad4)

In [2]:
import pandas as pd

## Series and Dataframes

- Series and Dataframes are the two major data structures in pandas.
- Series can be thought of as 1-dimensional lists or arrays.
- Dataframes can be thought of as 2-d dimensional tables.

In [None]:
# Creating a series
s = pd.Series([1,2,3])
print(f"Series example:")
display(s)

df = pd.DataFrame({
    'A': [1,2,3],
    'B': [4,5,6],
    'C':[7,8,9]
})
print(f"Dataframe example:")
display(df)

## Indexing and Slicing

- In pandas' data structures, indexing is done using the indices provided, contrary to regular python lists.
    * _Caveat_: As of this tutorial, python list style indexing is still allowed with a warning.
- If no indices are provided, like in the previous example, the series/dataframe uses the default index, which is 0 to n-1 where 'n' is the number of elements.

In [None]:
# Indexing example
s = pd.Series([1,2,3], index=['a', 'b', 'c'])
print(f"Series: ")
display(s)
print(f"Element 'b': {s['b']}")

# Another indexing example
s = pd.Series(['d', 'e', 'f'], index=[215, 168, 900])
print(f"Series: ")
display(s)
print(f"Element 900: {s[900]}")

# Slicing example
s = pd.Series([1,2,3,4,5], index=['a', 'b', 'c', 'd', 'e'])

## Data loading

- Data can be loaded/inputted using Pandas in many formats such as:
    * csv
    * json
    * excel
- 'csv' is typically used more than others

In [3]:
path_to_data = "data/iris.csv"
df = pd.read_csv(path_to_data)
df

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Virginica
146,6.3,2.5,5.0,1.9,Virginica
147,6.5,3.0,5.2,2.0,Virginica
148,6.2,3.4,5.4,2.3,Virginica
