# Pandas Tutorial

# Introduction

Pandas is a Python library for data manipulation and analysis. It provides powerful data structures like Series and DataFrame that are designed to handle structured data easily.

In this tutorial, we will cover the following:
1. Installation of Pandas
2. Pandas Data Structures: Series and DataFrame
3. DataFrame Operations: Viewing, Selection, and Filtering
4. Data Cleaning and Handling Missing Data
5. Data Merging, Joining, and Concatenation
6. Aggregation and Grouping
7. Working with Time Series Data
8. Input and Output: Reading and Writing Data


# 1. Installation


You can install Pandas using pip (Python Package Manager):
```
!pip install pandas
```

In [None]:
import pandas as pd
import numpy as np


# 2. Pandas Data Structures: Series and DataFrame

The two primary data structures in pandas are Series (1D) and DataFrame (2D). They are built on top of NumPy arrays but with added functionality.


In [None]:
# Series: One-dimensional array-like object
data_series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print("Series:\n", data_series)

# DataFrame: Two-dimensional, tabular data
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [24, 27, 22, 32],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

dataframe = pd.DataFrame(data)
print("\nDataFrame:\n", dataframe)

# 3. DataFrame Operations: Viewing, Selection, and Filtering

Pandas provides intuitive methods to view, select, and filter data within a DataFrame.



In [None]:

# Viewing data
print("\nFirst 2 rows of DataFrame:\n", dataframe.head(2))
print("\nSummary of DataFrame:\n", dataframe.describe())

# Selection of columns and rows
print("\nSelect 'Name' column:\n", dataframe['Name'])

# Selecting multiple columns
print("\nSelect 'Name' and 'City' columns:\n", dataframe[['Name', 'City']])

# Filtering rows based on conditions
print("\nRows where Age > 25:\n", dataframe[dataframe['Age'] > 25])

# 4. Data Cleaning and Handling Missing Data

Pandas makes it easy to clean and handle missing or invalid data.


In [None]:
# Adding missing values
data_with_nan = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                 'Age': [24, None, 22, 32],
                 'City': ['New York', 'Los Angeles', None, 'Houston']}

df_with_nan = pd.DataFrame(data_with_nan)
print("\nDataFrame with NaN values:\n", df_with_nan)

# Filling missing values
df_filled = df_with_nan.fillna('Unknown')
print("\nFilled NaN with 'Unknown':\n", df_filled)

# Dropping missing values
df_dropped = df_with_nan.dropna()
print("\nDropped rows with NaN values:\n", df_dropped)

# 5. Data Merging, Joining, and Concatenation

Pandas provides easy-to-use methods for combining DataFrames.


In [None]:
# Concatenation
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

concat_df = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated DataFrames:\n", concat_df)

# Merging DataFrames
left = pd.DataFrame({'key': ['A', 'B', 'C'], 'left_value': [1, 2, 3]})
right = pd.DataFrame({'key': ['A', 'B', 'D'], 'right_value': [4, 5, 6]})

merged_df = pd.merge(left, right, on='key', how='outer')
print("\nMerged DataFrame (outer join):\n", merged_df)

# 6. Aggregation and Grouping

Pandas provides powerful grouping and aggregation functions to summarize data.


In [None]:
# Grouping by a column
grouped_df = dataframe.groupby('City')['Age'].mean()
print("\nAverage Age per City:\n", grouped_df)

# Aggregating data using multiple functions
agg_df = dataframe.groupby('City').agg({'Age': ['mean', 'min', 'max']})
print("\nAggregated Age statistics by City:\n", agg_df)

# 7. Working with Time Series Data

Pandas has robust tools for handling time series data, including parsing dates and resampling.


In [None]:
# Creating a time series
date_range = pd.date_range('2024-01-01', periods=5, freq='D')
time_series = pd.Series(np.random.randn(5), index=date_range)
print("\nTime Series Data:\n", time_series)

# Resampling time series data
resampled_series = time_series.resample('2D').mean()
print("\nResampled (2-day mean):\n", resampled_series)

# 8. Input and Output: Reading and Writing Data

Pandas supports a wide variety of file formats to read and write data, including CSV, Excel, JSON, and SQL.

# Reading from a CSV file

You can read data from a CSV file using:

```
df_from_csv = pd.read_csv('data.csv')
```

# Writing DataFrame to a CSV file

To write a DataFrame to a CSV file, use:

```
dataframe.to_csv('output.csv', index=False)
```

# Reading from and writing to Excel

To read/write Excel files, install the openpyxl library:

```
!pip install openpyxl
```

Reading:

```
df_from_excel = pd.read_excel('data.xlsx')
```

Writing:

```
dataframe.to_excel('output.xlsx', index=False)
```


# Conclusion

In this tutorial, we covered the basics of pandas, including data manipulation, cleaning, merging, and grouping operations. Pandas is an essential tool for data analysis and manipulation in Python.
