<h1 style="color:orange">DataFrame Preparation and Manipulation with Pandas</h1> 

Pandas is a powerful library in Python for data manipulation and analysis. Central to its functionality is the DataFrame, which is a two-dimensional labeled data structure. In this tutorial, we'll cover some of the most important operations to prepare and manipulate a DataFrame using Pandas.

In [None]:
import pandas as pd

## 3. Creating a DataFrame
You can create a DataFrame from various data sources such as lists, dictionaries, NumPy arrays, CSV files, Excel files, and more. Here's how you can create a DataFrame from a dictionary:

In [None]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Height': [5.2, 6.0, 5.6, 5.10],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

df = pd.DataFrame(data)
print(df)

## 4. Basic DataFrame Operations
### 4.1. Viewing DataFrame Information
You can use info() method to get a concise summary of the DataFrame:

In [None]:
print(df.info())

### 4.2. Viewing DataFrame Head and Tail
To view the first few rows of the DataFrame, you can use head() method:

In [None]:
print(df.head())

To view the last few rows, you can use tail() method:

In [None]:
print(df.tail())

## 5. Indexing and Selecting Data
### 5.1. Selecting Columns
You can select a single column by specifying its name:

In [None]:
print(df['Name'])

You can select multiple columns by passing a list of column names:

In [None]:
print(df[['Name', 'Age']])

### 5.2. Selecting Rows
You can select rows by their index using iloc[] or loc[]:

In [None]:
print(df.iloc[0])  # Select the first row
print(df.loc[1])   # Select the row with index 1

### 5.3. Conditional Selection
You can select rows based on certain conditions:

In [None]:
print(df[df['Age'] > 30])  # Select rows where Age is greater than 30

## 6. Adding and Removing Columns
### 6.1. Adding a Column
You can add a new column to the DataFrame:

In [None]:
df['Gender'] = ['Female', 'Male', 'Male', 'Female']
print(df)

### 6.2. Removing a Column
You can remove a column using drop() method:

In [None]:
df.drop('City', axis=1, inplace=True)
print(df)

## 7. Data Manipulation
### 7.1. Sorting Data
You can sort the DataFrame based on one or more columns:

In [None]:
print(df.sort_values(by='Age'))  # Sort by Age

### 7.2. Grouping Data
You can group data based on certain columns and perform operations:

In [None]:
grouped = df.groupby('Gender')
print(grouped.mean())  # Calculate mean age by gender

## 8. Handling Missing Data
### 8.1. Detecting Missing Data
You can detect missing data using isnull() or notnull():

In [None]:
print(df.isnull())   # True where values are NaN
print(df.notnull())  # True where values are not NaN

### 8.2. Handling Missing Data
You can handle missing data by dropping or filling them:

In [None]:
print(df.dropna())       # Drop rows with any NaN values
print(df.fillna(0))      # Fill NaN values with 0

## 9. Grouping Data and Computing Statistics
Pandas provides the groupby() method to group data based on one or more columns and perform operations on these groups. This is a powerful feature for analyzing and summarizing data based on different categories.

### 9.1. Grouping Data by a Single Column
You can group data by a single column using the groupby() method. Here's how you can group data by the 'Gender' column and compute statistics:

In [None]:
grouped = df.groupby('Gender')
print(grouped.describe())

This will produce a summary of statistics for each group.


### 9.2. Grouping Data by Multiple Columns
You can also group data by multiple columns. For example, you can group data by both 'Gender' and 'Age' columns:



In [None]:
grouped = df.groupby(['Gender', 'Age'])
print(grouped.mean())


This will compute statistics for each combination of gender and age.



### 9.3. Computing Statistics on Grouped Data
Once data is grouped, you can compute various statistics on the groups. For example, you can compute the mean, median, sum, standard deviation, etc. For instance, to compute the mean age for each gender group:



In [None]:
print(grouped['Age'].mean())


Or to compute the median height for each gender group:



In [None]:
print(grouped['Height'].median())


### 9.4. Accessing Specific Statistics
You can access specific statistics for each group using methods such as mean(), median(), sum(), etc. For instance, to access the mean age for each gender group:



In [None]:
print(grouped['Age'].mean())

Or to access the median height for each gender group:

In [None]:
print(grouped['Height'].median())


This section provides an overview of how to group data using the `groupby()` method and compute statistics on the grouped data. It demonstrates the flexibility of Pandas for analyzing and summarizing data based on different categories.