# Introduction to Pandas and Handling CSV Files

Pandas is a powerful Python library used for data manipulation and analysis,
o,ffering data structures and operations for manipulating numerical tables and
time series.

Before we get started make sure you run the code cell below to ensure panda's is
installed in your Workspace.


In [None]:
!pip install pandas


## What is Pandas?

Pandas provides two key data structures: `DataFrame` and `Series`, with the former being a 2-dimensional labeled data structure with columns of potentially different types, and the latter a 1-dimensional labeled array.


In [None]:
import pandas as pd

# Example of creating a simple DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32]}
df = pd.DataFrame(data)

print(df)


- **DataFrame creation**: Here, we create a DataFrame from a dictionary. This structure is useful for representing real-world data in a tabular format.
- **Printing DataFrame**: By printing the DataFrame, we can see its tabular structure with rows and columns.


## What is a CSV File?

A CSV (Comma-Separated Values) file is a type of plain text file that uses
specific structuring to arrange tabular data. Because it's a plain text file, it
can contain only actual text data—in other words, printable ASCII or Unicode
characters. 

Often the data you will work with in a machine learning project is stored in a CSV file.



## How to Load a CSV File

To load a CSV file into a Pandas DataFrame, we use the `pd.read_csv()` function, providing it with the path to the file.


In [None]:
# Load a CSV file
df = pd.read_csv('./fictional-characters.csv')

print(df)

- **Loading CSV**: The `pd.read_csv()` function reads a CSV file and converts it into a DataFrame.
- **Displaying the loaded DataFrame**: By printing the DataFrame, we see the data that was in the CSV file in a structured format.


## How to Look at the CSV File with `head()` and `info()`

- **Using `head()`**: This function returns the first N rows of the DataFrame, which is useful for quickly testing if your DataFrame has the right type of data.
- **Using `info()`**: This method provides a concise summary of the DataFrame, including the number of non-null entries in each column.


In [None]:
# Display the first 5 rows of the DataFrame
print(df.head())

# Get a concise summary of the DataFrame
print(df.info())


- **Inspecting the DataFrame**: `head()` helps us quickly peek at the data, while `info()` gives us a summary, including data types and missing values.



## Explore Different Data Wrangling Exercises

Data wrangling, also known as data munging, involves cleaning, structuring, and enriching raw data into a desired format for better decision making in less time.



### Filtering Data: 

Select rows based on column values.


In [None]:

# Filter rows where the character's age is greater than 100
older_than_30 = df[df['age'] > 30]
print(older_than_30)


### Handling Missing Data: 

You might have noticed Yoda's age is missing. Below we identify and fill missing
age values with the mean of all the ages.

In [None]:
# Fill missing values in 'age' column with the mean age
df['age'].fillna(df['age'].mean(), inplace=True)
print(df)



### Adding Columns: 

Compute new columns based on existing ones.

In [None]:
# Add a new column 'AgeNextYear'
df['AgeNextYear'] = df['age'] + 1
print(df)



### 5. **Grouping and Aggregating**: 

Group data and calculate aggregate statistics. Below we are grouping all the
characters by their home planet then calculating the average age of each
characters from the same home planet. As you will see the average age of
characters from earth is 134.6 years old.


In [None]:

# Group by 'Name' and calculate mean age
grouped = df.groupby('planet')['age'].mean()
print(grouped)


- **Practical Examples**: These examples demonstrate how to manipulate and analyze data using Pandas, including filtering, adding new data, handling missing values, and aggregating data. 

These steps offer a starting point for data analysis with Pandas, providing tools to load, view, and wrangle data from CSV files effectively.