# PANDAS

### CONTENTS

## 1. Introduction to Pandas
    1.1 What is Pandas?
    1.2 Why Use Pandas?
## 2. Installing and Importing Pandas
    2.1 Installation via Anaconda
    2.2 Installation via pip
    2.3 Importing Pandas
## 3. Data Structures in Pandas
    3.1 Series
    3.2 DataFrame
## 4. Reading and Writing Data
    4.1 Reading CSV Files
    4.2 Writing CSV Files
    4.3 Reading Excel Files
    4.4 Writing Excel Files
## 5. Data Manipulation
    5.1 Indexing and Selection
    5.2 Filtering Data
    5.3 Sorting Data
    5.4 Aggregating Data
    5.5 Handling Missing Data
## 6. Data Cleaning
    6.1 Removing Duplicates
    6.2 Renaming Columns
    6.3 Handling Null Values
    6.4 Changing Data Types
## 7. Data Visualization
    7.1 Line Plots
    7.2 Bar Plots
    7.3 Scatter Plots
    7.4 Histograms
    7.5 Box Plots
## 8. Time Series Analysis
    8.1 Creating Time Series Data
    8.2 Resampling and Frequency Conversion
    8.3 Shifting and Lagging
    8.4 Rolling Window Functions
## 9. Conclusion

*********************************************************************

# 1. Introduction to PANDAS
#### 1.1 What is Pandas?
Pandas is an open-source library in Python that provides data
manipulation and analysis tools. It is built on top of NumPy and
provides easy-to-use data structures and data analysis
functions. Pandas is widely used in the field of data science for
tasks such as data cleaning, data transformation, data
visualization, and data analysis.
#### 1.2 Why Use Pandas?
* Easy handling of structured data: Pandas provides powerful data structures, such as Series and DataFrame, that allow for easy manipulation and analysis of structured data.
* Data alignment and integration: Pandas can handle datafrom various sources and align them based on commonindices or column names, making it easy to integrate data from different sources.
* Efficient data manipulation: Pandas provides vectorized operations and optimized algorithms, which significantly speed up data manipulation tasks compared to traditional Python methods.
* Missing data handling: Pandas offers flexible tools for handling missing data, allowing users to either drop or fill in missing values.
* Data visualization: Pandas integrates well with popular data visualization libraries, such as Matplotlib and Seaborn,enabling the creation of insightful visualizations.


# 2. Installing and Importing Pandas

#### 2.2 Installation via pip
      pip install pandas
#### 2.3 Importing Pandas
      import pandas as pd

In [1]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


# 3. Data Structures in Pandas
   #### 3.1 Series
  

In [2]:
import pandas as pd
import numpy as np

# Creating Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
series

0    10
1    20
2    30
3    40
4    50
dtype: int64

 #### 3.2 DataFrame
  

In [6]:
data = { 'Name' : ['John','Jane','Mike'],
        'Age' : [25, 30, 35],
        'City' : ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)
print(df)

   Name  Age      City
0  John   25  New York
1  Jane   30    London
2  Mike   35     Paris


# Reading and Writing Data
#### 4.1 Reading CSV Files


In [None]:
df = pd.read_csv('data.csv')
df

#### 4.2 Writing CSV Files


In [None]:
df.to_csv('data.csv')
df

#### 4.3 ReadingExcelFiles

In [None]:
df = pd.read_excel('data.xlsx')
df

#### 4.4 Writing Excel Files

In [None]:
df.to_excel('output.xlsx', index = False)


# 5. Data Manipulation
#### 5.1 Indexing and Selection

In [None]:
#COLUMNS

# Selecting a Single Column
col = df['Column_Name']

# Selecting a Multipe Column
col = df['Column1','Column2']

In [None]:
#ROWS

#Selecting rows based on conditions
selected_rows = df[df['Column_Name']>50]

#Selecting rows based on their positions
selected_rows = df.iloc[2:5]

#### 5.2 Filtering Data


In [None]:
filtered_data = df[df['Column']>50]

#### 5.3 Sorting Data


In [None]:
# Sorting Data based on a single column
sorted_data = df.sort_values('Column')

# Sorting Data based on a Multiple column
sorted_data = df.sort_values(['Column1','Column2'])

#### 5.4 Aggregating Data

In [None]:
# Calculating the sum of a column
col_Sum = df['Column_Name'].sum()

# Calculating the mean of a column
col_Mean = df['Column_Name'].mean()

# Calculating the number of occurance in a column
col_count = df['Column_Name'].value_counts()

#### 5.5 Handling Missing Data

In [None]:
# Dropping rows with missing values
clean_data = df.dropna()

# Filling missing values with a specific value
filled_data = df.fillna(0)

# Filling missing value with the mean of the column
meas_filled_data = df.fillna(df.mean())

# 6. Data Cleaning
#### 6.1 Removing Duplicates


In [None]:
# Checking for duplicate rows
duplicate_rows = df.duplicated()

# Dropping duplicate rows
clea_data = df.frop_duplicates()

#### 6.2 Renaming Columns

In [None]:
df.rename(columns = {'Old Name':'New_Name'}, inplace = True)

#### 6.3 Handling Null Values
* isnull()
* notnull()
* dropna()

In [None]:
# Checking for null values
null_values = df.isnull()

# Dropping rows with null values
clean = df.dropna()

#### 6.4 Changing Data Types

In [None]:
df['Column'] = df['Column'].astype(int)

# 7. Data Visualization
#### 7.1 Line Plots


In [None]:
import matplotlib.pyplot as plt

df.plot(x = 'Date', y = 'Value')
plt.title('Line Plot')
plt.show()

#### 7.2 Bar Plots

In [None]:
df.plot(kind ='bar',x ='Category', y = 'Value')
plt.title('Bar Plot')
plt.show()

#### 7.3 Scatter Plots

In [None]:
df.plot(kind ='scatter',x ='X', y = 'Y')
plt.title('Scatter Plot')
plt.show()

#### 7.4 Histograms


In [None]:
df['Column_Name'].plot(kind='hist')
plt.title('Histogram')
plt.show()

#### 7.5 Box Plots

In [None]:
df.boxplot(column = 'Value', by = 'Category')
plt.title('Box Plot')
plt.show()

# 8. Time Series Analysis
#### 8.1 Creating Time Series Data


In [None]:
# Creating a time series with a fixed frequency
dates = pd.date_range(start = '2021-01-01', end = '2021-12-31', freq = 'D')

#### 8.2 Resampling and Frequency Conversion


In [None]:
Resampling to a lower frequency
weekly_data = df.resample('W').mean()

#### 8.3 Shifting and Lagging

In [None]:
# Shifting time series data by specified number of periods
shifted_data = df['Column_Name'].shift(1)

#### 8.4 Rolling Window Functions


In [None]:
# Calculating the rolling mean over a window of size 3
rolling_mean = df['Column_Name'].rolling(window=3).mean()

# 9. Conclusion
This practical guide has introduced you to the key features of
Pandas for data science. You learned about installing Pandas,
importing it into your Python environment, and working with its
data structures, such as Series, DataFrame, and Panel. You
also explored various data manipulation techniques, data
cleaning methods, data visualization options, and time series
analysis capabilities in Pandas. With this knowledge, you can
start using Pandas effectively for your data science projects