# Pandas

Pandas is an open-source data analysis and manipulation library for Python. It is widely used for working with structured data (like tables), offering powerful, flexible, and easy-to-use tools to process and analyze large amounts of data efficiently.

Why Use Pandas?

* Pandas allows us to analyze big data and make conclusions based on statistical theories.
* Pandas can clean messy data sets, and make them readable and relevant.
* Relevant data is very important in data science.

## Installation

Install it using this command: `pip install pandas`

## Data Structures

It provides two primary data structures: Series (1D) and DataFrame (2D), allowing for easy manipulation of structured data.

In [None]:
pip install pandas

In [None]:
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4, 5])
print(s)

m = [9,2,4]
# Labels
myvar = pd.Series(m, index = ["x", "y", "z"])
print(myvar)

# Create a DataFrame
data = {'Name': ['John', 'Jane', 'Tom'],
        'Age': [23, 25, 30],
        'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
print(df)

#refer to the row index: Pandas use the loc attribute to return one or more specified row(s)
print(df.loc[0:1])

#‌ Load Files Into a DataFrame
#df = pd.read_csv('data.csv')
#print(df) 

## DataFrame Manipulation

DataFrames can be sliced, indexed, and modified like matrices. This includes adding, modifying, and removing rows/columns.

In [None]:
# Select a single column
print(df['Name'])

# Add a new column
df['Salary'] = [50000, 60000, 70000]

# Select rows using slicing
print(df.loc[1:2])  # selects rows by labels
print(df.iloc[1:2])  # selects rows by index positions

# Drop a column
df.drop('Salary', axis=1, inplace=True)
print(df)

## Analyzing DataFrames

One of the most used method for getting a quick overview of the DataFrame, is the `head()` method.
The `head()` method returns the headers and a specified number of rows, starting from the top.

In [None]:
print(df.head())

# Just first row
print(df.head(1))


There is also a `tail()` method for viewing the last rows of the DataFrame.
The `tail()` method returns the headers and a specified number of rows, starting from the bottom.


In [None]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank'],
        'Age': [25, 30, 35, 40, 45, 50],
        'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo', 'Sydney']}

df = pd.DataFrame(data)

# Display the last 3 rows
print(df.tail(3))

The DataFrames object has a method called `info()`, that gives you more information about the data set.

In [None]:
print(df.info())

## Cleaning Data

Data cleaning involves handling missing values, transforming data, and making the dataset ready for analysis.

In [None]:
# Creating a DataFrame with some missing values
data = {'Name': ['Alice', 'Bob', None, 'David', 'Eva'],
        'Age': [25, None, 35, 40, None],
        'City': ['New York', 'Paris', 'Berlin', 'London', None]}

df = pd.DataFrame(data)

# Display original DataFrame
print("Original DataFrame:")
print(df)

# Fill missing values in the 'Age' column with the mean age
df['Age'].fillna(df['Age'].mean())

# Drop rows with any missing data
df.dropna(inplace=True)

# Renaming the 'Name' column to 'Full Name'
df.rename(columns={'Name': 'Full Name'}, inplace=True)

# Convert the 'Full Name' column to uppercase
df['Full Name'] = df['Full Name'].str.upper()

# Display the cleaned DataFrame
print("\nCleaned DataFrame:")
print(df)

## Filtering and Sorting

You can filter data based on conditions and sort it by specific columns.

In [None]:
# Filter rows where age is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)

# Sort by a column
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)

## Aggregation and Grouping

Grouping allows you to perform operations like sum, mean, and count on subsets of data.

In [None]:
# Group data by city and calculate mean age
grouped = df.groupby('City')['Age'].mean()

# Count the number of occurrences per city
city_count = df.groupby('City').size()
print(grouped)
print(city_count)

These are just an overview of pandas. Feel free to ask questions to your teaching assistants.