# <div class="alert alert-danger">KnowlBook - Data Analysis with Pandas</div>

# <div class="alert alert-success">Chapter 1 - Introduction to Pandas</div>

![welcome.jpg](attachment:welcome.jpg)

<div class="alert alert-success">Welcome, my fellow data enthusiasts!! Today, we shall dive into the enchanting world of pandas, a powerful Python library for data manipulation and analysis. With pandas, we can store and analyze data in two primary structures: Series and DataFrame.</div>
    
    

<div class="alert alert-success">Imagine you have a vast collection of data, much like the colorful leaves scattered across the forest floor. Each piece of data tells a unique story.But,to make sense of all this data and uncover its hidden wisdom, we need a special key—a key that can unlock the potential within each piece of information. That magical key, my friends, is Pandas!</div>
    
    

<div class="alert alert-success">Pandas is not only powerful but also flexible,Just like the bamboo bends and sways with the wind, pandas can handle diverse data types, missing values, and even allow us to create new calculations effortlessly.
</div>    
    

<div class="alert alert-success">And the best part is,Pandas works harmoniously with other popular Python libraries, such as NumPy, Matplotlib, and SciPy. Together, they form a formidable team, enabling us to conquer any data analysis challenge that comes our way!
    
</div>

![f84723e1-4090-4fd0-8684-935997958a51%20%281%29%20(1).jpg](attachment:f84723e1-4090-4fd0-8684-935997958a51%20%281%29%20(1).jpg)

## <div class="alert alert-success">Import Library</div>

<div class="alert alert-success">Don't forget to invite me !!you can call me by my nick name (here <mark>pd</mark>)

to install me in your world

`pip install pandas`

to import me..

`import pandas as pd`

to check how old I am(version)..

`print(pd.__version__)` </div>

In [1]:
#import Pandas
import pandas as pd

![file%20formats.jpg](attachment:file%20formats.jpg)

<div class="alert alert-success">you can read different types of files such as:

   * CSV Files `read_csv()`
    
    
   * JSON Files `pd.read_json()`
   
   
   * HTML Files `pd.read_html()`
   
   
   * Excel Files `pd.read_excel()`
   
   
   * SQL Files `pd.read_sql()`
   
   
   * Pickle Files `pd.read_pickle()`  
    etc.</div>

# <div class="alert alert-info">Chapter 2. Pandas Series : Querying and Indexing</div>


![series.jpg](attachment:series.jpg)

## <div class="alert alert-info">2.1 Introduction to Pandas Series</div>

<div class="alert alert-info">Pandas Series is like a set of labeled containers, each holding a specific type of data, such as numbers or text. Just like people have their names for easy identification, each container in a Series has its own unique label, making it easy to find and access the data inside.

* Pandas Series is a collection of labeled containers.
* Each container holds a specific type of data (e.g., numbers or text).
* Each container has a unique label for easy identification. </div>

<div class="alert alert-info">Pandas Series is like a collection of delightful ingredients,Imagine a treasure chest filled with colorful spices, each representing a single type of data, such as sweet sugar, spicy chili, or tangy lemon. Just like our friends here have their favorite flavors, each ingredient in a Series has its own unique label, acting like a name tag for easy identification.</div>

<div class="alert alert-info">we can pick and choose these ingredients to create flavorful recipes. With the help of the unique labels, we can easily find and use the ingredients we need for our magical dishes. Just like skilled chefs, we can query specific ingredients, combine them to form a tasty blend, or even add new ingredients to our collection.</div>

<div class="alert alert-info">Our pantry of pandas Series is a versatile tool, making it easy for us to organize and manipulate data. Whether we're cooking up data analysis, preparing statistical recipes, or serving insightful insights, pandas Series allows us to conjure the perfect mix of ingredients to craft our data into delightful creations.</div>

In [2]:
# creating a panda Series

# a simple list of numbers
data = [10,20,30,40,50]

#Creating a Series
series = pd.Series(data)

#Displaying the Series
print(series)

0    10
1    20
2    30
3    40
4    50
dtype: int64


<div class="alert alert-info"><alert-heading><h2>2.2 Querying elements in a Series</h2>

Think of it as finding a friend in a crowd using their name tag.
    
we can use these labels to access individual data in the Series. Just mention the label, and the corresponding data will be at your fingertips!

In [3]:
# Accessing elements by index

print(series[0])
print(series[3])

10
40


<div class="alert alert-info"><alert-heading><h2>2.3 Conditional Indexing</h2>

Sometimes, we want to find elements that meet certain conditions, like finding people whose age is greater than 30

In [4]:
# accessing elements with a condition
print(series[series>30])

3    40
4    50
dtype: int64


<div class="alert alert-info"><alert-heading><h2>2.4 Mathematical Operations with Series</h2>

Just like performing arithmetic with regular numbers, we can use arithmetic operations on our Series in pandas.</div>

In [5]:
#adding a number with each elements of the Series
result = series + 5
print(result)

0    15
1    25
2    35
3    45
4    55
dtype: int64


In [6]:
#adding elements of a series with elements of another series

result_addition = series + pd.Series([1, 2, 3, 4, 5])
print(result_addition)

0    11
1    22
2    33
3    44
4    55
dtype: int64


In [7]:
# Subtracting elements of a series with elements of another series
result_subtraction = series - pd.Series([1, 2, 3, 4, 5])
print(result_subtraction)

0     9
1    18
2    27
3    36
4    45
dtype: int64


In [8]:
# Multiplying elements of a series with elements of another series
result_multiplication = series * pd.Series([2, 3, 4, 5, 6])
print(result_multiplication)

0     20
1     60
2    120
3    200
4    300
dtype: int64


In [9]:
# Dividing elements of a series with elements of another series
result_division = series / pd.Series([2, 2, 2, 2, 2])
print(result_division)

0     5.0
1    10.0
2    15.0
3    20.0
4    25.0
dtype: float64


In [10]:
# Exponentiation
# elements of one series are exponents of another series
result_exponentiation = series ** pd.Series([2, 3, 4, 5, 6])
print(result_exponentiation)

0            100
1           8000
2         810000
3      102400000
4    15625000000
dtype: int64


In [11]:
# Modulo
result_modulo = series % pd.Series([3, 5, 7, 9, 11])
print(result_modulo)

0    1
1    0
2    2
3    4
4    6
dtype: int64


In [12]:
import numpy as np

# Applying advanced mathematical functions
result_sin = np.sin(series)
result_cos = np.cos(series)
result_log = np.log(series)
print(result_sin)
print(result_cos)
print(result_log)


0   -0.544021
1    0.912945
2   -0.988032
3    0.745113
4   -0.262375
dtype: float64
0   -0.839072
1    0.408082
2    0.154251
3   -0.666938
4    0.964966
dtype: float64
0    2.302585
1    2.995732
2    3.401197
3    3.688879
4    3.912023
dtype: float64


<div class="alert alert-info"><alert-heading><h2>2.5 Named Indexing</h2>

we can give special names to the index.we can use these unique names to query them with ease. Just like calling out our friends by their names, we can access specific elements directly using their special labels.
    
you can give any name you want.</div>

In [13]:
# giving names to the index
indexed_series = pd.Series(data,index=["a","b","c","d","e"])
print(indexed_series)

a    10
b    20
c    30
d    40
e    50
dtype: int64


## <div class="alert alert-info">2.6 Accessing elements with named indexes</div>

In [14]:
# Accessing elements with named Indexes

print(indexed_series["c"])

30


<div class="alert alert-info"><h2>2.7 Combining Series</h2>

Just like mixing different ingredients to craft a powerful spell, we can merge and combine multiple Series to create a new, potent DataFrame!

It's like connecting pieces of a puzzle to unveil a bigger picture</div>

In [15]:
#creating series 1
series1 = pd.Series([1,2,3])

#creating series 2
series2 = pd.Series([4,5,6])

#combining series
# giving columns names for dataframe and assigning the series
combined_series = pd.DataFrame({"col1" :series1 ,"col 2": series2})

#printing combined series
print(combined_series)

   col1  col 2
0     1      4
1     2      5
2     3      6


<div class="alert alert-info"><h2>2.8 Updating and Deleting</h2>

Updating Series:

To update a Pandas Series, you can directly assign new values to specific elements using their index or labels.

Deleting Series:

To delete elements from a Pandas Series, you can use the drop() method by specifying the index or labels to remove. Alternatively, you can use Python's del keyword to delete specific elements.</div>

### <div class="alert alert-info">2.8.1 Updating with Scalar Value</div>

In [16]:
# Updating with a Scalar Value
series_updated_scalar = series.apply(lambda x: x + 1)
print(series_updated_scalar)

0    11
1    21
2    31
3    41
4    51
dtype: int64


### <div class="alert alert-info">2.8.2 Updating with Another Series</div>

In [17]:
# Updating with Another Series
series_updated_another = series.apply(lambda x: x * pd.Series([2, 3, 4, 5]))
print(series_updated_another)

     0    1    2    3
0   20   30   40   50
1   40   60   80  100
2   60   90  120  150
3   80  120  160  200
4  100  150  200  250


### <div class="alert alert-info">2.8.3 Deleting with drop() method</div>

In [18]:
# Deleting with drop() method
series_deleted = series.drop(labels=[1, 3])
print(series_deleted)

0    10
2    30
4    50
dtype: int64


### <div class="alert alert-info">Deleting with boolean condition</div>

In [19]:
# Deleting with boolean condition
series_deleted_condition = series[series > 10]
print(series_deleted_condition)

1    20
2    30
3    40
4    50
dtype: int64


# <div class="alert alert-warning">Chapter 3. Pandas Dataframe : Querying and Indexing

![dataframe.jpg](attachment:dataframe.jpg)

## <div class="alert alert-warning">3.1 Introduction to Pandas DataFrame</div>

<div class="alert alert-warning">Pandas DataFrame is like a versatile canvas, just like a chart or table, where rows and columns come together to create a beautiful arrangement of data. It's similar to organizing information in a spreadsheet or a table, making it easy to work with complex datasets.

Key points about Pandas DataFrame:

* DataFrame is a data structure in pandas that stores data in a tabular format.
* It consists of rows and columns, resembling a table or spreadsheet.
* Rows represent individual records or observations, while columns represent different attributes or variables.
* DataFrame can hold diverse data types, including numbers, text, dates, and more.
* It allows us to perform various operations, such as filtering, sorting, and computation on data.
* DataFrames are widely used for data analysis, manipulation, and exploration in Python.</div>

<div class="alert alert-warning">Imagine it as a table where each row represents a different person, and each column contains specific information about them, such as their name, age, or city. With Pandas DataFrame, we can store and manipulate various data types, like numbers and text, making it perfect for handling complex datasets.</div>

    
<div class="alert alert-warning">Creating a DataFrame is like building a table from scratch, where we gather data and organize it in rows and columns. Once we have our DataFrame, we can perform various operations, like searching for specific information or sorting the data based on certain attributes.</div>

<div class="alert alert-warning">It becomes our data exploration companion, allowing us to analyze and uncover hidden patterns within our datasets. With Pandas DataFrame at our disposal, we can efficiently handle large volumes of data and embark on exciting data analysis adventures.</div>

In [20]:
# A dictionary with data

data = {
    "Name" : ["JJ","Yoyo","Cody","Cece"],
    "Age" : [25,30,22,27],
    "City" : ["New York","London","Paris","Tokyo"]
}

#creating a dataframe

df = pd.DataFrame(data)

# Dislaying the dataframe

df

Unnamed: 0,Name,Age,City
0,JJ,25,New York
1,Yoyo,30,London
2,Cody,22,Paris
3,Cece,27,Tokyo


<div class="alert alert-warning"><h2>3.2 Querying and Filtering Rows</h2>
    
Just like spotting friends in a crowd, we can query and filter specific rows from our DataFrame</div>

In [21]:
# getting rows where age is greater than 25

print(df[df["Age"]>25])

   Name  Age    City
1  Yoyo   30  London
3  Cece   27   Tokyo


## <div class="alert alert-warning">3.3 accessing rows</div>

In [22]:
# taking out a specific row using index

specific_row = df[0:1]

#display the row
specific_row

Unnamed: 0,Name,Age,City
0,JJ,25,New York


<div class="alert alert-warning"><h2>3.4 accessing Columns</h2>
    
Columns in a DataFrame are unique that we can access individually</div>

In [23]:
# Accessing column "Name"

print(df["Name"])

0      JJ
1    Yoyo
2    Cody
3    Cece
Name: Name, dtype: object


<div class="alert alert-warning"><h2>3.5 "OR" Query</h2>

you can use the <mark>| (OR) operator </mark>to create a logical OR query to filter the rows that meet either of the given conditions. The OR operator helps in selecting rows that satisfy at least one of the conditions provided.</div>

In [24]:
# OR query to filter rows with age greater than 30 or city is 'London'
or_query = df[(df['Age'] > 30) | (df['City'] == 'London')]

# Displaying the DataFrame after applying the OR query
print(or_query)

   Name  Age    City
1  Yoyo   30  London


<div class="alert alert-warning"><h2>3.6 "AND" Query</h2>

you can use the<mark> & (AND) operator</mark> to create a logical AND query to filter the rows that meet both of the given conditions. The AND operator helps in selecting rows that satisfy all the conditions provided.</div>

In [25]:
# AND query to filter rows with age greater than 25 and city is 'London'
and_query = df[(df['Age'] > 25) & (df['City'] == 'London')]

# Displaying the DataFrame after applying the AND query
print(and_query)

   Name  Age    City
1  Yoyo   30  London


<div class="alert alert-warning"><h2>3.7 "IN" query</h2>
    
you can use the <mark>isin()</mark> method to create an IN query, which allows you to filter rows based on whether a column value is present in a list of values.</div>

In [26]:
# IN query to filter rows with specific names
names_to_filter = ['JJ', 'David', 'Cody']
result = df[df['Name'].isin(names_to_filter)]

# Displaying the DataFrame after applying the IN query
print(result)

   Name  Age      City
0    JJ   25  New York
2  Cody   22     Paris


<div class="alert alert-warning"><h2>3.8 Filtering Data with NOT (~) Operator</h2>

Filtering data with the<mark> NOT operator (~)</mark> is like excluding specific elements from your magical collection. It allows you to extract rows that do not meet a certain condition.

In [27]:
# Filtering data with NOT operator
filtered_data_not = df[~(df['City'] == 'Tokyo')]
print(filtered_data_not)

   Name  Age      City
0    JJ   25  New York
1  Yoyo   30    London
2  Cody   22     Paris


<div class="alert alert-warning"><h2>3.9 Combining Multiple Conditions</h2>
    
In the world of pandas, you can create elaborate filtering spells by combining multiple conditions using parentheses. This allows you to filter your data in sophisticated and precise ways.

In [28]:
# Combining multiple conditions
filtered_data_complex = df[(df['Age'] > 25) & (df['City'].isin(['London', 'Paris']))]
print(filtered_data_complex)

   Name  Age    City
1  Yoyo   30  London


<div class="alert alert-warning"><h2>3.10 Changing index</h2>

you can change the index to a new set of values using the <mark>set_index()</mark> method. This method allows you to set a new column as the index or specify a list of values to be used as the new index.</div>

In [29]:
# Setting the 'Name' column as the new index
df.set_index('Name', inplace=True)

# Displaying the DataFrame with the updated index
df

Unnamed: 0_level_0,Age,City
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
JJ,25,New York
Yoyo,30,London
Cody,22,Paris
Cece,27,Tokyo


In [30]:
#changing the default index
#df = pd.dataframe(d)
df = pd.DataFrame(data,index = ["index1","index2","index3","index4"])
df

Unnamed: 0,Name,Age,City
index1,JJ,25,New York
index2,Yoyo,30,London
index3,Cody,22,Paris
index4,Cece,27,Tokyo


<div class="alert alert-warning"><h2>3.11 Reset Index</h2>

If you want to remove the current index and use the default integer-based index, you can use the method <mark>reset_index()</mark></div>

In [31]:
# Resetting the index to default integer-based index
df.reset_index(inplace=True)

# Displaying the DataFrame with the default index
df

Unnamed: 0,index,Name,Age,City
0,index1,JJ,25,New York
1,index2,Yoyo,30,London
2,index3,Cody,22,Paris
3,index4,Cece,27,Tokyo


<div class="alert alert-warning"><h2>3.12 Querying Specific Cells</h2>

Cells in our DataFrame are like hidden gems, waiting to be discovered

In [32]:
#accessing a specific column "Name" where index is 1
print(df.at[1,'Name'])

Yoyo


<div class="alert alert-success"><h1>chapter 4. Slicing Technique</h1>

Sometimes, we want to focus on a specific part of the canvas, like zooming in on a masterpiece</div>

![slicing.jpg](attachment:slicing.jpg)

## <div class="alert alert-success">4.1 Introduction to Slicing Technique</div>

<div class="alert alert-success">in Pandas slicing is a technique that allows us to extract specific rows or columns from a DataFrame. It involves selecting a subset of the data based on certain criteria, such as row indices or column names. 
    
    
like cutting a delicious cake into smaller, tasty pieces. In the world of data, slicing helps us take specific parts of a dataset, just like picking out your favorite toppings from a pizza.

<div class="alert alert-success">Imagine you have a big, colorful fruit salad with different fruits like apples, oranges, and grapes. You want to enjoy only the juicy oranges and sweet grapes without the apples.

* Slicing the fruit salad: Slicing is like using a magic knife to separate the oranges and grapes from the rest of the fruits. It allows you to pick just the fruits you love and leave the others behind.

* Selecting specific data: Similarly, in data, slicing helps us select only the information we need. We can choose specific rows or columns from a table, like getting the names and ages of kids without the adults.</div>

In [33]:
# Example DataFrame for row slicing

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 22, 27, 29],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
}

df = pd.DataFrame(data)
print(df)


      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
3    David   27     Tokyo
4      Eve   29    Sydney


<div class="alert alert-success"><h2>4.1 Row Slicing</h2>

We can use slicing to extract specific rows from the table. For example, slicing from index 1 to 4 will give us the rows with indices 1,2 and 3</div>

In [34]:
# Row slicing

selected_rows = df[1:4]
print(selected_rows)


      Name  Age    City
1      Bob   30  London
2  Charlie   22   Paris
3    David   27   Tokyo


<div class="alert alert-success"><h2>4.2 Column slicing</h2>
    
Column slicing is like choosing specific columns, just like picking our favorite portions from the canvas

In [35]:
# Column slicing
selected_columns = df[['Name', 'City']]
print(selected_columns)


      Name      City
0    Alice  New York
1      Bob    London
2  Charlie     Paris
3    David     Tokyo
4      Eve    Sydney


<div class="alert alert-success"><h2>4.3 Conditional slicing</h2>
    
We can also perform conditional slicing, where we select only the portions that meet certain criteria

In [36]:
# Conditional slicing
selected_rows_condition = df[df['Age'] > 25]
print(selected_rows_condition)


    Name  Age    City
1    Bob   30  London
3  David   27   Tokyo
4    Eve   29  Sydney


<div class="alert alert-success"><h2>4.4 Combining row and column slicing</h2>
    
Sometimes, we want to slice both rows and columns, like creating a personalized collection of art.</div>

In [37]:
# Combining row and column slicing
selected_rows_and_columns = df.loc[1:3, ['Name', 'City']]
print(selected_rows_and_columns)


      Name    City
1      Bob  London
2  Charlie   Paris
3    David   Tokyo


<div class="alert alert-success"><h2>4.5 Setting a New Index</h2>
    
A unique index is like giving a name to each Index, making it easier to find them later

In [38]:
# Setting a new index
df.set_index('Name', inplace=True)

#displaying the dataframe
df

Unnamed: 0_level_0,Age,City
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,25,New York
Bob,30,London
Charlie,22,Paris
David,27,Tokyo
Eve,29,Sydney


<div class="alert alert-success"><h2>4.6 Conditional slicing with new index</h2>
    
With a new index, we can perform conditional slicing in a more intuitive way

In [39]:
# Conditional slicing with new index
selected_rows_new_index = df.loc[df['Age'] > 25]
print(selected_rows_new_index)


       Age    City
Name              
Bob     30  London
David   27   Tokyo
Eve     29  Sydney


<div class="alert alert-success"><h2>4.7 Resetting the Index</h2>
    
Sometimes, we need to reset the index to its default numerical values

In [40]:
# Resetting the index
df.reset_index(inplace=True)

#displaying the dataframe
print(df)


      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
3    David   27     Tokyo
4      Eve   29    Sydney


<div class="alert alert-success"><h2>4.8 iloc</h2>
    
<strong>iloc() </strong> is like a teleportation spell that allows you to access data by its numerical position. It stands for "integer location" and enables you to select rows and columns using integer-based indexing. With iloc, you can reach into your DataFrame and pluck out specific rows and columns based on their numerical positions.

<div class="alert alert-success"><alert-heading><h3>4.8.1 Using iloc to access rows</h3>
    
you can access specific rows in your DataFrame by their numerical index. It's like reaching into your DataFrame and grabbing the rows at particular positions.

In [41]:
# Using iloc to access rows
row_iloc = df.iloc[1]  # Accessing the second row (index 1)
print(row_iloc)


Name       Bob
Age         30
City    London
Name: 1, dtype: object


<div class="alert alert-success"><h3>4.8.2 Slicing rows</h3>

In [42]:
#slicing rows
print(df.iloc[1:3])

      Name  Age    City
1      Bob   30  London
2  Charlie   22   Paris


<div class="alert alert-success"><alert-heading><h3>4.8.3 Using iloc to Access Columns</h3>
    
iloc also allows you to access specific columns in your DataFrame by their numerical index. It's like plucking out specific columns based on their positions.

In [43]:
# Using iloc to access columns
column_iloc = df.iloc[:, 1]  # Accessing the second column (index 1)
print(column_iloc)


0    25
1    30
2    22
3    27
4    29
Name: Age, dtype: int64


<div class="alert alert-success"><alert-heading><h3>4.8.4 Using iloc to Access Specific Cells</h3>

we can also target specific cells in your DataFrame by their row and column positions.

In [44]:
# Using iloc to access specific cells
cell_iloc = df.iloc[3, 2]  # Accessing the cell at row 3 and column 2
print(cell_iloc)


Tokyo


<div class="alert alert-success"><h2>4.9 loc</h2>
    
loc is like a treasure map that allows you to access data by its labels. It stands for "location" and enables you to select rows and columns using label-based indexing. With loc, you can navigate your DataFrame using its labels, making data selection more intuitive and expressive

<div class="alert alert-success"><alert-heading><h3>4.9.1 Using loc to Access Rows</h3>
    
With loc, you can access specific rows in your DataFrame using their labels. It's like finding the rows based on their names.

In [45]:
# Using loc to access rows
row_loc = df.loc[1]  # Accessing the row with label 1
print(row_loc)


Name       Bob
Age         30
City    London
Name: 1, dtype: object


In [46]:
# slicing Columns
print(df.loc[:, "Name": "City"])

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
3    David   27     Tokyo
4      Eve   29    Sydney


<div class="alert alert-success"><alert-heading><h3>4.9.2 Using loc to Access Columns</h3>
    
loc also allows you to access specific columns in your DataFrame using their labels. It's like picking out the columns based on their names.

In [47]:
# Using loc to access columns
column_loc = df.loc[:, 'Age']  # Accessing the 'Age' column
print(column_loc)


0    25
1    30
2    22
3    27
4    29
Name: Age, dtype: int64


<div class="alert alert-success"><alert-heading><h3>4.9.3 Using loc to Access Specific Cells</h3>
    
With loc, you can target specific cells in your DataFrame using both row and column labels. It's like discovering hidden treasures within your DataFrame.

In [48]:
# Using loc to access specific cells
cell_loc = df.loc[3, 'City']  # Accessing the cell at row label 3 and column label 'City'
print(cell_loc)


Tokyo


# <div class="alert alert-info">Chapter 5. Merging Dataframes

![data%20merging.jpg](attachment:data%20merging.jpg)

## <div class="alert alert-info">5.1 Introduction to Merging DataFrames

<div class="alert alert-info">Data merging is like combining puzzle pieces to create a complete picture. In the world of data, we have different datasets with related information, and data merging helps us bring them together to form a unified and comprehensive dataset.

Imagine you have two lists of friends, one with their names and another with their favorite colors. You want to create a new list with both names and favorite colors side by side.

* Merging the friend lists: Data merging is like combining the two lists to create a new list that includes both names and favorite colors. This way, you have all the information you need about your friends in one place.

* Matching by a common attribute: Just like matching friends in both lists by their names, data merging looks for a common attribute (e.g., an ID or name) to connect information from different datasets.

In [64]:
# Example DataFrames for merging
import pandas as pd

# DataFrame 1
data1 = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'London', 'Paris']
}

df1 = pd.DataFrame(data1)

# DataFrame 2
data2 = {
    'Name': ['David', 'Eve'],
    'Age': [27, 29],
    'City': ['Tokyo', 'Sydney']
}

df2 = pd.DataFrame(data2)

print(df1)
print(df2)


      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
    Name  Age    City
0  David   27   Tokyo
1    Eve   29  Sydney


<div class="alert alert-info"><h2>5.2 Concatenating DataFrames</h2>
    
Concatenating DataFrames is like stacking one scroll after another.It's perfect when we want to combine multiple DataFrames with the same columns.

In [65]:
# Concatenating DataFrames
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)


      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
0    David   27     Tokyo
1      Eve   29    Sydney


<div class="alert alert-info"><h2>5.3 Merging DataFrames based on Columns</h2>

Merging based on columns is like weaving threads of data from two different tales. We use a common column to merge the DataFrames, creating a magical blend of information.

In [66]:
# Merging DataFrames based on columns
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)


Empty DataFrame
Columns: [Name, Age_x, City_x, Age_y, City_y]
Index: []


<div class="alert alert-info"><h2>5.4 Merging DataFrames with Different Column Names</h2>
    
Sometimes, the columns in the DataFrames may have different names, like different languages spoken by creatures in DataLand. But worry not! We can still merge them.

In [52]:
# Merging DataFrames with different column names
data3 = {
    'Full Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [90, 85, 78]
}

df3 = pd.DataFrame(data3)

merged_df_diff_names = pd.merge(df1, df3, left_on='Name', right_on='Full Name')
print(merged_df_diff_names)


      Name  Age      City Full Name  Score
0    Alice   25  New York     Alice     90
1      Bob   30    London       Bob     85
2  Charlie   22     Paris   Charlie     78


<div class="alert alert-info"><h2>5.5 Joining DataFrames</h2>
    
Joining DataFrames is like combining magical artifacts with complementary powers.We join DataFrames based on their indices, creating a powerful and unified dataset.

In [67]:
# Joining DataFrames
df1.set_index('Name', inplace=True)
df2.set_index('Name', inplace=True)

joined_df = df1.join(df3)
print(joined_df)


         Age      City Full Name  Score
Name                                   
Alice     25  New York       NaN    NaN
Bob       30    London       NaN    NaN
Charlie   22     Paris       NaN    NaN


## <div class="alert alert-info">5.6 JOINS in Pandas</div>

![joins.jpg](attachment:joins.jpg)

<div class="alert alert-info">Joins are like merging puzzle pieces to create a complete picture.
Joins are a fundamental tool for data integration and exploration, allowing you to construct a complete story from scattered pieces of information.

##### the four common types of joins: 
  * inner join, 
  * outer join, 
  * left join, 
  * and right join</div>
  

In [62]:
# Example DataFrames for Join Operations
import pandas as pd

# DataFrame 1
data1 = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
}
df11 = pd.DataFrame(data1)

# DataFrame 2
data2 = {
    'ID': [3, 4, 5, 6, 7],
    'Age': [25, 30, 22, 27, 29],
}
df22 = pd.DataFrame(data2)

print("DataFrame 1:")
print(df11)
print("\nDataFrame 2:")
print(df22)


DataFrame 1:
   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
3   4    David
4   5      Eve

DataFrame 2:
   ID  Age
0   3   25
1   4   30
2   5   22
3   6   27
4   7   29


<div class="alert alert-info"><h3>5.6.1 Inner Join</h3>
    
An inner join is like keeping only the pieces that fit perfectly in both puzzles. It merges the DataFrames based on the common values in the specified column(s).</div>

In [70]:
# Inner Join
inner_join_df = pd.merge(df11, df22, on='ID', how='inner')
print("\nInner Join:")
print(inner_join_df)



Inner Join:
   ID     Name  Age
0   3  Charlie   25
1   4    David   30
2   5      Eve   22


<div class="alert alert-info"><h3>5.6.2 Outer Join</h3>

An outer join is like combining both puzzles while filling missing pieces with empty spaces. It merges the DataFrames, keeping all rows from both, and filling non-matching rows with NaN or specified values.

In [71]:
# Outer Join
outer_join_df = pd.merge(df11, df22, on='ID', how='outer')
print("\nOuter Join:")
print(outer_join_df)



Outer Join:
   ID     Name   Age
0   1    Alice   NaN
1   2      Bob   NaN
2   3  Charlie  25.0
3   4    David  30.0
4   5      Eve  22.0
5   6      NaN  27.0
6   7      NaN  29.0


<div class="alert alert-info"><h3>5.6.3 Left Join</h3>

A left join is like merging only the pieces from the left puzzle and filling the missing ones from the right puzzle with empty spaces. It keeps all rows from the left DataFrame and fills non-matching rows from the right DataFrame with NaN or specified values.

In [72]:
left_join_df = pd.merge(df11, df22, on='ID', how='left')
print("\nLeft Join:")
print(left_join_df)



Left Join:
   ID     Name   Age
0   1    Alice   NaN
1   2      Bob   NaN
2   3  Charlie  25.0
3   4    David  30.0
4   5      Eve  22.0


<div class="alert alert-info"><h3>5.6.4 Right Join</h3>

A right join is like merging only the pieces from the right puzzle and filling the missing ones from the left puzzle with empty spaces. It keeps all rows from the right DataFrame and fills non-matching rows from the left DataFrame with NaN or specified values.

In [73]:
# Right Join
right_join_df = pd.merge(df11, df22, on='ID', how='right')
print("\nRight Join:")
print(right_join_df)



Right Join:
   ID     Name  Age
0   3  Charlie   25
1   4    David   30
2   5      Eve   22
3   6      NaN   27
4   7      NaN   29


<div class="alert alert-info"><h2>5.7 Handling Common Merging Issues</h2>
    
Pandas also offers solutions for handling common merging issues. For instance, we can handle duplicate columns, missing values, and different types of joins.

In [69]:
# Handling common merging issues
df1['Age'] = [25, 30, 22]
df2['Age'] = [27, 29]

merged_duplicate_df = pd.merge(df1, df2, on='Name', suffixes=('_left', '_right'))
print(merged_duplicate_df)

merged_outer_df = pd.merge(df1, df2, on='Name', how='outer')
print(merged_outer_df)


Empty DataFrame
Columns: [Age_left, City_left, Age_right, City_right]
Index: []
         Age_x    City_x  Age_y  City_y
Name                                   
Alice     25.0  New York    NaN     NaN
Bob       30.0    London    NaN     NaN
Charlie   22.0     Paris    NaN     NaN
David      NaN       NaN   27.0   Tokyo
Eve        NaN       NaN   29.0  Sydney


# <div class="alert alert-warning">Chapter 6. Group By Operation

<div class="alert alert-warning"><h2>6.1 Introduction to Group By Operation</h2>
    
GroupBy is a Pandas operation that lets us split data into groups based on a specific criterion, such as a column, and perform operations on each group independently.
    
    
Group By is like a special gathering: Imagine we have a bunch of animals, and we want to see how many of them have the same fur color. We gather animals with similar fur colors together to study them.
    
Grouping data based on specific columns: In our data world, we can do something similar. We can group our data based on certain categories, like grouping people by their ages, or cities by their populations.</div>

In [74]:
# Example DataFrame for Group By operation
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Cece', 'cody', 'Tom'],
    'Age': [25, 30, 22, 27, 29],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
}

df = pd.DataFrame(data)
print(df)


    Name  Age      City
0  Alice   25  New York
1    Bob   30    London
2   Cece   22     Paris
3   cody   27     Tokyo
4    Tom   29    Sydney


<div class="alert alert-warning"><h2>6.2 Grouping by a Single Column</h2>
    
We can start by grouping our DataFrame based on a single column.It's like gathering creatures based on their abilities.

In [75]:
# Grouping by a single column
grouped_single_column = df.groupby('City')
print(grouped_single_column.groups)


{'London': [1], 'New York': [0], 'Paris': [2], 'Sydney': [4], 'Tokyo': [3]}


<div class="alert alert-warning"><h2>6.3  Grouping by Multiple Columns</h2>
    
Just like forming clans based on multiple traits, we can group by multiple columns.</div>

In [76]:
# Grouping by multiple columns
grouped_multiple_columns = df.groupby(['City', 'Age'])
print(grouped_multiple_columns.groups)


{('London', 30): [1], ('New York', 25): [0], ('Paris', 22): [2], ('Sydney', 29): [4], ('Tokyo', 27): [3]}


<div class="alert alert-warning"><h2>6.4 Aggregating Data within Groups</h2>
    
After we've formed our groups, we can perform  calculations within each clan.It's like discovering the average age or the total number of creatures in each group.

In [77]:
# Aggregating data within groups
aggregated_data = grouped_single_column['Age'].mean()
print(aggregated_data)


City
London      30.0
New York    25.0
Paris       22.0
Sydney      29.0
Tokyo       27.0
Name: Age, dtype: float64


<div class="alert alert-warning"><h2>6.5 Applying Multiple Aggregations</h2>
    
We can also apply multiple enchanting spells to our groups.Like finding both the minimum and maximum age in each clan.

In [78]:
# Applying multiple aggregations
multiple_aggregations = grouped_single_column['Age'].agg(['min', 'max'])
print(multiple_aggregations)


          min  max
City              
London     30   30
New York   25   25
Paris      22   22
Sydney     29   29
Tokyo      27   27


<div class="alert alert-warning"><h2>6.6 Custom Aggregations with Functions</h2>
    
Sometimes, we need to craft our own unique spells.Custom aggregations allow us to use our own functions for calculations.

In [79]:
# Custom aggregations with functions
def age_difference(group):
    return group.max() - group.min()

custom_aggregation = grouped_single_column['Age'].agg(age_difference)
print(custom_aggregation)


City
London      0
New York    0
Paris       0
Sydney      0
Tokyo       0
Name: Age, dtype: int64


<div class="alert alert-warning"><h2>6.7 Transforming Data within Groups</h2>
    
Transforming data is like casting spells that change the characteristics of each creature within the group.

In [80]:
# Transforming data within groups
df['Age Difference'] = grouped_single_column['Age'].transform(age_difference)
print(df)


    Name  Age      City  Age Difference
0  Alice   25  New York               0
1    Bob   30    London               0
2   Cece   22     Paris               0
3   cody   27     Tokyo               0
4    Tom   29    Sydney               0


<div class="alert alert-warning"><h2>6.8 Filtering Data within Groups</h2>

We can also use magical filters to keep only the creatures that meet specific criteria within each group.

In [81]:
# Filtering data within groups
filtered_data = grouped_single_column.filter(lambda x: x['Age'].mean() > 25)
print(filtered_data)


   Name  Age    City  Age Difference
1   Bob   30  London               0
3  cody   27   Tokyo               0
4   Tom   29  Sydney               0


<div class="alert alert-success"><h1>Chapter 7- Pivot Table</h1>
    
Pivot Table—a powerful and versatile tool for transforming their data into captivating summaries.</div>

![pivot.jpg](attachment:pivot.jpg)

## <div class="alert alert-success">7.1 Introduction to Pivot Table

<div class="alert alert-success">A pivot table is like a magical organizer that helps us make sense of messy data. It allows us to restructure and summarize information, making it easier to see patterns and trends. Just like arranging your toys in different groups to find your favorite ones quickly, a pivot table helps us organize and analyze data efficiently.


<div class="alert alert-success">Imagine you have a big box of colorful toys, each with a different shape and size. You want to know how many toys of each type you have and their total count. But counting them one by one seems overwhelming!

* Creating a pivot table: A pivot table is like a special toy organizer that groups similar toys together and counts them automatically. It saves you time and effort by doing the counting for you.

* Grouping and summarizing data: The pivot table takes all the toys and groups them by their shapes, then provides a neat summary of how many toys are in each group.

In [82]:
# Example DataFrame for Pivot Table
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Alice', 'Charlie'],
    'Subject': ['Math', 'Science', 'Math', 'Science', 'Math', 'Science', 'Science'],
    'Score': [85, 90, 78, 88, 92, 85, 80]
}

df = pd.DataFrame(data)
print(df)


      Name  Subject  Score
0    Alice     Math     85
1      Bob  Science     90
2  Charlie     Math     78
3    David  Science     88
4      Eve     Math     92
5    Alice  Science     85
6  Charlie  Science     80


<div class="alert alert-success"><h2>7.2 Creating a Basic Pivot Table</h2>

We can start by creating a basic Pivot Table to summarize our data.It's like brewing a potion that brings together the average scores for each student and subject.

In [83]:
# Creating a basic Pivot Table
pivot_table_basic = df.pivot_table(index='Name', columns='Subject', values='Score', aggfunc='mean')
print(pivot_table_basic)


Subject  Math  Science
Name                  
Alice    85.0     85.0
Bob       NaN     90.0
Charlie  78.0     80.0
David     NaN     88.0
Eve      92.0      NaN


## <div class="alert alert-success">7.3 Handling Missing Values

In [84]:
# Handling missing values in the Pivot Table
pivot_table_missing = df.pivot_table(index='Name', columns='Subject', values='Score', aggfunc='mean', fill_value=0)
print(pivot_table_missing)


Subject  Math  Science
Name                  
Alice      85       85
Bob         0       90
Charlie    78       80
David       0       88
Eve        92        0


<div class="alert alert-success"><h2>7.4 More Aggregation Functions</h2>

We have an array of enchanting aggregation functions at our disposal.We can find the sum, minimum, maximum, and many more.

In [85]:
# Using different aggregation functions in the Pivot Table
pivot_table_aggregation = df.pivot_table(index='Name', columns='Subject', values='Score', aggfunc=['mean', 'sum', 'min', 'max'])
print(pivot_table_aggregation)


         mean           sum           min           max        
Subject  Math Science  Math Science  Math Science  Math Science
Name                                                           
Alice    85.0    85.0  85.0    85.0  85.0    85.0  85.0    85.0
Bob       NaN    90.0   NaN    90.0   NaN    90.0   NaN    90.0
Charlie  78.0    80.0  78.0    80.0  78.0    80.0  78.0    80.0
David     NaN    88.0   NaN    88.0   NaN    88.0   NaN    88.0
Eve      92.0     NaN  92.0     NaN  92.0     NaN  92.0     NaN


<div class="alert alert-success"><h2>7.5 Multilevel Pivot Table</h2>
    
Experienced data wizard can create multilevel Pivot Tables, much like skilled artisans crafting detailed structures, as these tables offer deep insights with multiple layers of information.    

In [86]:
# Creating a multilevel Pivot Table
pivot_table_multilevel = df.pivot_table(index=['Name', 'Subject'], values='Score', aggfunc='mean')
print(pivot_table_multilevel)


                 Score
Name    Subject       
Alice   Math        85
        Science     85
Bob     Science     90
Charlie Math        78
        Science     80
David   Science     88
Eve     Math        92


<div class="alert alert-success"><h2>7.6 Pivot Table with Margins</h2>
    
We can even add margins to our Pivot Table, like including the overall mean scores.

In [87]:
# Adding margins to the Pivot Table
pivot_table_with_margins = df.pivot_table(index='Name', columns='Subject', values='Score', aggfunc='mean', margins=True)
print(pivot_table_with_margins)


Subject  Math  Science        All
Name                             
Alice    85.0    85.00  85.000000
Bob       NaN    90.00  90.000000
Charlie  78.0    80.00  79.000000
David     NaN    88.00  88.000000
Eve      92.0      NaN  92.000000
All      85.0    85.75  85.428571


# <div class="alert alert-info">Chapter 8. Date Time Functionality

![date%20time.jpg](attachment:date%20time.jpg)

## <div class="alert alert-info">8.1 Introduction to Date Time Functionality

<div class="alert alert-info">Date Time Functionality is like having a time-traveling machine in the world of data. It enables us to work with dates, times, and time intervals, just like we do in our daily lives, making it easier to analyze time-based data and perform various operations.

<div class="alert alert-info">Imagine you have a  diary that records all your daily activities, such as when you wake up, eat meals, and go to bed. You want to analyze your daily routine to find patterns and understand how your habits change over time.

* Date Time Functionality: Date Time Functionality is like using a special tool to work with the dates and times in your diary. It helps you perform tasks like finding the time difference between activities or grouping activities by the day of the week.

* Analyzing Time-based Data: With Date Time Functionality, you can effortlessly analyze your daily routine, such as calculating how much time you spend on different activities or identifying your most active days of the week.

In [89]:
# Example DataFrame for Date Time Functionality
import pandas as pd

data = {
    'Event': ['Event1', 'Event2', 'Event3', 'Event4', 'Event5'],
    'Date': ['2023-07-15', '2023-07-16', '2023-07-17', '2023-07-18', '2023-07-19'],
    'Sales': [100, 150, 200, 180, 220]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)


    Event       Date  Sales
0  Event1 2023-07-15    100
1  Event2 2023-07-16    150
2  Event3 2023-07-17    200
3  Event4 2023-07-18    180
4  Event5 2023-07-19    220


<div class="alert alert-info"><h2>8.2 Creating a Date Range</h2>

Pandas date_range() method allows you to generate a sequence of dates within a specified period, making it easy to create date ranges for analysis.

In [90]:
# Creating a Date Range
date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
print("Date Range:")
print(date_range)
print()

Date Range:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10',
               ...
               '2023-12-22', '2023-12-23', '2023-12-24', '2023-12-25',
               '2023-12-26', '2023-12-27', '2023-12-28', '2023-12-29',
               '2023-12-30', '2023-12-31'],
              dtype='datetime64[ns]', length=365, freq='D')



<div class="alert alert-info"><h2>8.3 Extracting Date Components</h2>
    
We can extract  components of dates like year, month, day, and more.

In [91]:
# Extracting date components

# Extracting Only date
df['Year'] = df['Date'].dt.year

# Extracting Only month
df['Month'] = df['Date'].dt.month

# Extracting Only day
df['Day'] = df['Date'].dt.day

# Extracting Only Minute
df['Minute'] = df['Date'].dt.minute

# Extracting Only Seconds
df['Seconds'] = df['Date'].dt.second
print(df)


    Event       Date  Sales  Year  Month  Day  Minute  Seconds
0  Event1 2023-07-15    100  2023      7   15       0        0
1  Event2 2023-07-16    150  2023      7   16       0        0
2  Event3 2023-07-17    200  2023      7   17       0        0
3  Event4 2023-07-18    180  2023      7   18       0        0
4  Event5 2023-07-19    220  2023      7   19       0        0


<div class="alert alert-info"><h2>8.4 Counting Days, Months, Years, Hours, Minutes, and Seconds</h2>

In [92]:
print("Count of Days:")
print(df['Date'].dt.day)
print()

Count of Days:
0    15
1    16
2    17
3    18
4    19
Name: Date, dtype: int32



In [93]:
print("Count of Months:")
print(df['Date'].dt.month)
print()

Count of Months:
0    7
1    7
2    7
3    7
4    7
Name: Date, dtype: int32



In [94]:
print("Count of Years:")
print(df['Date'].dt.year)
print()

Count of Years:
0    2023
1    2023
2    2023
3    2023
4    2023
Name: Date, dtype: int32



In [95]:
print("Count of Hours:")
print(df['Date'].dt.hour)
print()

Count of Hours:
0    0
1    0
2    0
3    0
4    0
Name: Date, dtype: int32



In [96]:
print("Count of Minutes:")
print(df['Date'].dt.minute)
print()

Count of Minutes:
0    0
1    0
2    0
3    0
4    0
Name: Date, dtype: int32



In [97]:
print("Count of Seconds:")
print(df['Date'].dt.second)
print()

Count of Seconds:
0    0
1    0
2    0
3    0
4    0
Name: Date, dtype: int32



<div class="alert alert-info"><h2>8.5 Filtering Data with Dates</h2>

We can filter our data using enchanting time criteria.It's like selecting events that happened within a specific time period.

In [98]:
# Filtering data with dates
start_date = '2023-07-16'
end_date = '2023-07-18'
filtered_df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
print(filtered_df)


    Event       Date  Sales  Year  Month  Day  Minute  Seconds
1  Event2 2023-07-16    150  2023      7   16       0        0
2  Event3 2023-07-17    200  2023      7   17       0        0
3  Event4 2023-07-18    180  2023      7   18       0        0


<div class="alert alert-info"><h2>8.6 Resampling Time Series Data</h2>

Resampling is like shaping the flow of time. We can transform our data to daily, monthly, or even annual intervals, and perform magical calculations.

In [99]:
# Resampling time series data
df.set_index('Date', inplace=True)
resampled_df = df.resample('D').sum()
print(resampled_df)


             Event  Sales  Year  Month  Day  Minute  Seconds
Date                                                        
2023-07-15  Event1    100  2023      7   15       0        0
2023-07-16  Event2    150  2023      7   16       0        0
2023-07-17  Event3    200  2023      7   17       0        0
2023-07-18  Event4    180  2023      7   18       0        0
2023-07-19  Event5    220  2023      7   19       0        0


<div class="alert alert-info"><h2>8.7 Shifting Time Index</h2>
    
Shifting the time index is like a magical leap through time.We can shift our data forward or backward in time, opening new windows of insight.

In [100]:
# Shifting time index
shifted_df = df.shift(periods=1, freq='D')
print(shifted_df)


             Event  Sales  Year  Month  Day  Minute  Seconds
Date                                                        
2023-07-16  Event1    100  2023      7   15       0        0
2023-07-17  Event2    150  2023      7   16       0        0
2023-07-18  Event3    200  2023      7   17       0        0
2023-07-19  Event4    180  2023      7   18       0        0
2023-07-20  Event5    220  2023      7   19       0        0


<div class="alert alert-info"><h2>8.8 Time Zone Handling</h2>

Time zones are like different magical realms, each with its own flow of time, We can convert our data to different time zones to harmonize their chronology.

In [101]:
# Time zone handling
df_utc = df.tz_localize('UTC')
df_est = df_utc.tz_convert('US/Eastern')
print(df_est)


                            Event  Sales  Year  Month  Day  Minute  Seconds
Date                                                                       
2023-07-14 20:00:00-04:00  Event1    100  2023      7   15       0        0
2023-07-15 20:00:00-04:00  Event2    150  2023      7   16       0        0
2023-07-16 20:00:00-04:00  Event3    200  2023      7   17       0        0
2023-07-17 20:00:00-04:00  Event4    180  2023      7   18       0        0
2023-07-18 20:00:00-04:00  Event5    220  2023      7   19       0        0


# <div class="alert alert-warning">Chapter 9. Manipulating the DataFrame

![data%20manipulation.jpg](attachment:data%20manipulation.jpg)

## <div class="alert alert-warning">9.1 Introduction to DataFrame Manipulation

<div class="alert alert-warning">DataFrame manipulation is like having a versatile toolkit to reshape and transform your data. Just as an artist sculpts clay into various forms, DataFrame manipulation allows you to mold and restructure your data to suit your analysis needs. This flexible tool empowers you to filter, sort, add, remove, and modify data, unlocking valuable insights hidden within your datasets.

DataFrame manipulation lets you perform operations such as:
    

* Filtering: Selecting specific rows or columns based on conditions to focus on relevant data.

* Sorting: Arranging data in a particular order, making it easier to analyze patterns.

* Adding/Removing Columns: Creating new columns or deleting existing ones to customize your data structure.

* Modifying Values: Changing individual data points or applying mathematical operations to update data.

* Grouping: Grouping data based on common attributes to perform aggregated analyses.

* Merging/Joining: Combining multiple DataFrames to consolidate information from different sources.
    

With DataFrame manipulation, you can tailor your data to extract meaningful insights, just as an artist crafts a masterpiece from raw materials, turning data into valuable information for informed decision-making.

In [118]:
# Example DataFrame for DataFrame Manipulation
import pandas as pd

data = {
    'Name': ['JJ', 'Cody', 'Cece', 'Yoyo', 'Tom'],
    'Age': [25, 30, 22, 27, 29],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
}

df = pd.DataFrame(data)

#displaying dataframe
print(df)


   Name  Age      City
0    JJ   25  New York
1  Cody   30    London
2  Cece   22     Paris
3  Yoyo   27     Tokyo
4   Tom   29    Sydney


## <div class="alert alert-warning">9.2 Adding New Columns

In [119]:
# adding a new column name "Country"
df["Country"] = ["USA","UK","France","Japan","Australia"]

#display the updated dataframe
df

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia


<div class="alert alert-warning"><h2>9.3 Adding New Rows</h2>
    
we can give index number to add new row. if the index is already available then it will create a duplicate index value.


In [120]:
#adding new row by giving an index number
df.loc[5]= ["Nico",27,"Manhattan","USA"]

#displaying dataframe
df

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA


In [121]:
#adding new row using label
df.loc["Nina"]= ["Nina",23,"Delhi","India"]

#displaying dataframe
df

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA
Nina,Nina,23,Delhi,India


<div class="alert alert-warning"><h2>9.3 Removing Columns</h2>
    
Removing columns is like clearing away the unnecessary clutter Just like trimming the extra branches from a tree.

In [122]:
# Removing columns
df_city = df.drop('City', axis=1)

#displaying dataframe
df_city


Unnamed: 0,Name,Age,Country
0,JJ,25,USA
1,Cody,30,UK
2,Cece,22,France
3,Yoyo,27,Japan
4,Tom,29,Australia
5,Nico,27,USA
Nina,Nina,23,India


## <div class="alert alert-warning">9.5 Removing Rows

In [123]:
#dropping rows using index number
df_drop = df.drop(df.index[1])

#displaying the dataframe
df_drop

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA
Nina,Nina,23,Delhi,India


In [124]:
#dropping a multiple rows using index
df_drops = df.drop(df.index[[2,5]])

#displaying the dataframe
df_drops

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
Nina,Nina,23,Delhi,India


In [125]:
#dropping a range of rows
df_row_drop = df.drop(df.index[1:])

#displaying the dataframe
df_row_drop

Unnamed: 0,Name,Age,City,Country
0,JJ,25,New York,USA


<div class="alert alert-warning"><h2>9.4 Renaming Columns</h2>
    
renaming columns is like giving our DataFrame a new identity

In [126]:
# Renaming columns
df.rename(columns={'Name': 'Full Name'}, inplace=True)

#displaying dataframe
df


Unnamed: 0,Full Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA
Nina,Nina,23,Delhi,India


<div class="alert alert-warning"><h2>9.5 Handling Missing Values</h2>

Missing values are like hidden puzzles that we need to solve.We can fill them, drop them, or use them in enchanting ways.

In [127]:
# Handling missing values

#filling "Age" column with its mean
df['Age'].fillna(df['Age'].mean(), inplace=True)

#displaying dataframe
df


Unnamed: 0,Full Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA
Nina,Nina,23,Delhi,India


In [128]:
#dropping null values

df = df.dropna()

#displaying dataframe
df

Unnamed: 0,Full Name,Age,City,Country
0,JJ,25,New York,USA
1,Cody,30,London,UK
2,Cece,22,Paris,France
3,Yoyo,27,Tokyo,Japan
4,Tom,29,Sydney,Australia
5,Nico,27,Manhattan,USA
Nina,Nina,23,Delhi,India


<div class="alert alert-warning"><h2>9.6 Sorting the DataFrame</h2>
    
Sorting is like arranging the  scrolls in a particular order for easy discovery.

In [129]:
# sorting the dataframe by a column in ascending order
sorted_df = df.sort_values(by = "Age")

# displaying the dataframe
sorted_df

Unnamed: 0,Full Name,Age,City,Country
2,Cece,22,Paris,France
Nina,Nina,23,Delhi,India
0,JJ,25,New York,USA
3,Yoyo,27,Tokyo,Japan
5,Nico,27,Manhattan,USA
4,Tom,29,Sydney,Australia
1,Cody,30,London,UK


In [130]:
# sorting the dataframe by a column in descending order
sorted_df = df.sort_values(by = "Age", ascending=False)

# displaying the dataframe
sorted_df

Unnamed: 0,Full Name,Age,City,Country
1,Cody,30,London,UK
4,Tom,29,Sydney,Australia
3,Yoyo,27,Tokyo,Japan
5,Nico,27,Manhattan,USA
0,JJ,25,New York,USA
Nina,Nina,23,Delhi,India
2,Cece,22,Paris,France


<div class="alert alert-warning"><h2>9.7 Filtering Data</h2>
    
Filtering data is like taking out data that meets specific criteria.It's like finding the gems among a sea of treasures.

In [131]:
# Filtering data
filtered_df = df[df['Age'] > 25]

print(filtered_df)


  Full Name  Age       City    Country
1      Cody   30     London         UK
3      Yoyo   27      Tokyo      Japan
4       Tom   29     Sydney  Australia
5      Nico   27  Manhattan        USA


## <div class="alert alert-warning">9.8 Applying Functions to Data

In [132]:
# Applying functions to data
def add_ten(age):
    return age + 10

df['Age'] = df['Age'].apply(add_ten)

#displaying dataframe
print(df)


     Full Name  Age       City    Country
0           JJ   35   New York        USA
1         Cody   40     London         UK
2         Cece   32      Paris     France
3         Yoyo   37      Tokyo      Japan
4          Tom   39     Sydney  Australia
5         Nico   37  Manhattan        USA
Nina      Nina   33      Delhi      India


<div class="alert alert-warning"><h2>9.9 Updating Columns</h2>

Updating columns in a DataFrame allows us to modify existing data or create new insights from our data.</div>

In [133]:
# Adding a new column with arithmetic operation
df['Age_In_5_Years'] = df['Age'] + 5

# Displaying the updated DataFrame
print(df)

     Full Name  Age       City    Country  Age_In_5_Years
0           JJ   35   New York        USA              40
1         Cody   40     London         UK              45
2         Cece   32      Paris     France              37
3         Yoyo   37      Tokyo      Japan              42
4          Tom   39     Sydney  Australia              44
5         Nico   37  Manhattan        USA              42
Nina      Nina   33      Delhi      India              38


<div class="alert alert-warning"><h2>9.10 updating a single cell in dataframe</h2>
    
To update a single cell in a DataFrame, we can use the .at or .loc accessor methods. 

The <strong>.at method</strong> is used for fast label-based scalar access.
    
The <strong>.loc method</strong> is used for label-based access with more flexibility.

In [134]:
# Using .at method to update a single cell
df.at[1, 'Age'] = 31

# Displaying the DataFrame after updating cells
print(df)

     Full Name  Age       City    Country  Age_In_5_Years
0           JJ   35   New York        USA              40
1         Cody   31     London         UK              45
2         Cece   32      Paris     France              37
3         Yoyo   37      Tokyo      Japan              42
4          Tom   39     Sydney  Australia              44
5         Nico   37  Manhattan        USA              42
Nina      Nina   33      Delhi      India              38


In [135]:
# Using .loc method to update a single cell
df.loc[0, 'City'] = 'San Francisco'

# Displaying the DataFrame after updating cells
print(df)

     Full Name  Age           City    Country  Age_In_5_Years
0           JJ   35  San Francisco        USA              40
1         Cody   31         London         UK              45
2         Cece   32          Paris     France              37
3         Yoyo   37          Tokyo      Japan              42
4          Tom   39         Sydney  Australia              44
5         Nico   37      Manhattan        USA              42
Nina      Nina   33          Delhi      India              38


<div class="alert alert-warning"><h2>9.11 Unique Values</h2>

The pd.unique() function in Pandas returns an array of unique values from the input data

In [140]:
# Using pandas unique
df.nunique()


Full Name         7
Age               6
City              7
Country           6
Age_In_5_Years    6
dtype: int64

In [143]:
df.value_counts()

Full Name  Age  City           Country    Age_In_5_Years
Cece       32   Paris          France     37                1
Cody       31   London         UK         45                1
JJ         35   San Francisco  USA        40                1
Nico       37   Manhattan      USA        42                1
Nina       33   Delhi          India      38                1
Tom        39   Sydney         Australia  44                1
Yoyo       37   Tokyo          Japan      42                1
Name: count, dtype: int64

<div class="alert alert-warning"><h2>9.12 Replace elements</h2>
    
The replace() method in Pandas allows you to replace specific values in a DataFrame.

In [144]:
# Using pandas replace
df['Age'] = df['Age'].replace(30, 31)
print(df)

     Full Name  Age           City    Country  Age_In_5_Years
0           JJ   35  San Francisco        USA              40
1         Cody   31         London         UK              45
2         Cece   32          Paris     France              37
3         Yoyo   37          Tokyo      Japan              42
4          Tom   39         Sydney  Australia              44
5         Nico   37      Manhattan        USA              42
Nina      Nina   33          Delhi      India              38


![goodbye.jpg](attachment:goodbye.jpg)

by

**VINODHINI RAJAMANICKAM**

Course - Data Science