# Pandas 101: By <a target="_blank" href="https://medium.com/@niraj.e21/pandas-101-dccdc78c2248">Niraj Tiwari</a>

### ðŸ“Œ Table of Contents:
1. <a href="#1">Introduction to Pandas</a>
2. <a href="#2">Data Importing and Exporting</a>
3. <a href="#3">Basic Data Operations</a>
4. <a href="#4">Data Cleaning</a>
5. <a href="#5">Basic Data Analysis</a>
6. <a href="#6">Conditional Selections</a>
7. <a href="#7">Data Transformation</a>

<hr>

<p id="1"></p>

## **1. Introduction to Pandas**
Pandas is an open-source library in Python used for data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series. Itâ€™s particularly well-suited for handling structured data, i.e., data that is organized into tables.

**Series and DataFrames:** The two primary data structures in Pandas.

- **A Series** is a one-dimensional array-like structure.
- **A DataFrame** is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure.

**Example 1: Creating a Series**

In [1]:
import pandas as pd
s = pd.Series([1, 3, 5, 7, 9])
print(s)

0    1
1    3
2    5
3    7
4    9
dtype: int64


**Example 2: Creating a DataFrame**

In [2]:
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32]}
df = pd.DataFrame(data)
print(df)

    Name  Age
0   John   28
1   Anna   34
2  Peter   29
3  Linda   32


<hr>

<p id="2"></p>

## **2. Data Importing and Exporting** 
Pandas supports various data formats for importing and exporting, such as CSV, Excel, and SQL databases.

**Reading from CSV**

In [None]:
df = pd.read_csv('path/to/your/file.csv')
print(df.head())

**Writing data to CSV**

`df.to_csv('path/to/save/file.csv', index=False)`

<hr>

<p id="3"></p>

## **3. Basic Data Operations** 
These operations involve viewing, selecting, indexing, and slicing data.

**Viewing Data**

In [None]:
# To look at the first few rows of a DataFrame.
print(df.head())  # First five rows

**Selecting a Column**

In [None]:
# To access a column in the DataFrame.
ages = df['Age']
print(ages)
print(df['Name'])  # Displays the 'Name' column

**Slicing Rows**

In [None]:
subset = df[0:2]  # First two rows
print(subset)

**Filtering Data:** Selecting rows based on a condition.

In [None]:
filtered_df = df[df['Age'] > 30]  # Rows where age is greater than 30
print(filtered_df)

**Adding Columns:** You can add new columns to a DataFrame.

In [None]:
df['AgeInTenYears'] = df['Age'] + 10
print(df)

<hr>

<p id='4'></p>

## **4. Data Cleaning** 
Data cleaning involves handling missing data, data type conversion, and renaming/replacing values.

**Handling Missing Data**

In [None]:
df.fillna(0, inplace=True)  # Replace missing values with 0

**Data Type Conversion**

In [None]:
df['Age'] = df['Age'].astype(float)

**Renaming Columns**

In [None]:
df.rename(columns={'Age': 'AgeYears'}, inplace=True)

<hr>

<p id='5'></p>

## 5. Basic Data Analysis
Pandas provides functions to perform basic descriptive statistics and aggregations. 

**Descriptive Statistics**

In [None]:
# Quickly get a statistical summary of the data.
print(df.describe())

**Aggregation**

In [None]:
print(df['Age'].mean())  # Average age

<hr>

<p id='6'></p>

## 6. Conditional Selections
This involves selections and filtering techniques.

**Conditional Selections**

In [None]:
# Select rows where 'Age' is greater than 30
print(df[df['Age'] > 30])

filtered_df = df[df['Age'] > 30]  # Rows where age is greater than 30
print(filtered_df)

<hr>

<p id='7'></p>

## 7. Data Transformation
Transforming data using functions like `apply`, and performing group operations.

**Using Apply**

In [None]:
# Define a function
def to_uppercase(name):
    return name.upper()

# Apply function to column
df['Name'] = df['Name'].apply(to_uppercase)

*Another Example*

In [None]:
# Sample DataFrame
df = pd.DataFrame({'Period': [201701, 202012, 202305]})

# Define function
def format_period(period):
    period = str(period)  # Convert to string
    return f"{period[:4]}-{period[4:]}"  # Insert hyphen

# Apply function
df['Formatted_Period'] = df['Period'].apply(format_period)

print(df)

**GroupBy Operations**

In [None]:
# Grouping data and calculating mean for each group
grouped = df.groupby('Department')
print(grouped['Salary'].mean())

**Pivot Tables**

**Explanation of the Pivot Table Below:** \
	`1. index='Date' â†’ Rows are based on the Date column.` \
	`2. columns='Store' â†’ Columns are based on the Store (A & B).` \
	`3. values='Sales' â†’ Values inside the table are from the Sales column.` \
	`4. aggfunc='sum' â†’ Aggregates sales using the sum function.` 

In [None]:
# Sample sales data
data = {
    'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-03'],
    'Store': ['A', 'B', 'A', 'B', 'A'],
    'Product': ['Apple', 'Apple', 'Banana', 'Banana', 'Apple'],
    'Sales': [100, 150, 200, 180, 120]
}

df = pd.DataFrame(data)

# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Creating a pivot table
pivot = df.pivot_table(index='Date', columns='Store', values='Sales', aggfunc='sum')

print(pivot)