# Pandas Syntax and Usage
In this notebook, we will cover the essential Pandas syntax step by step.
Pandas is one of the most used libraries in data analysis and manipulation.
We will also incorporate the sample data and questions from your previous work.


In [1]:
# Importing pandas and setting up
import pandas as pd
import numpy as np

# Sample data (to be used with questions above)
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
    'Age': [24, 27, 22, 32, 29],
    'Score': [85, 92, 88, 79, 95],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

# Creating DataFrame
df = pd.DataFrame(data)
df

### 1. Creating DataFrame
We created a DataFrame from a dictionary. Here is the basic syntax:

```python
df = pd.DataFrame(data)
```
This `DataFrame` is a 2-dimensional labeled data structure with columns of potentially different types. Let's move on to some operations!

In [2]:
# Selecting a single column
df['Age']

### 2. Selecting Columns
To select a single column in a DataFrame, use the syntax `df['column_name']`:

```python
df['Age']
```
This will return the 'Age' column from the DataFrame. You can also select multiple columns by passing a list of column names.

In [3]:
# Selecting multiple columns
df[['Name', 'Score']]

### 3. Selecting Multiple Columns
To select multiple columns, use a list of column names inside double brackets:

```python
df[['Name', 'Score']]
```
This will return only the 'Name' and 'Score' columns.

In [4]:
# Filtering rows based on a condition
df[df['Age'] > 25]

### 4. Filtering Rows
To filter rows based on a condition, use a boolean condition. For example, filtering people older than 25:

```python
df[df['Age'] > 25]
```
This will return only the rows where 'Age' is greater than 25.

In [5]:
# Sorting data
df.sort_values(by='Score', ascending=False)

### 5. Sorting Data
You can sort data based on column values. To sort by the 'Score' column in descending order:

```python
df.sort_values(by='Score', ascending=False)
```
The `ascending=False` argument sorts the values in descending order. Set it to `True` for ascending order.

In [6]:
# Grouping data
df.groupby('City')['Score'].mean()

### 6. Grouping Data
Pandas allows you to group data and perform aggregate functions on it. For example, to calculate the average score by city:

```python
df.groupby('City')['Score'].mean()
```
This groups the DataFrame by the 'City' column and calculates the mean score for each group.

In [7]:
# Adding a new column
df['Age Group'] = ['Young' if age < 30 else 'Old' for age in df['Age']]
df

### 7. Adding a New Column
To add a new column, you can assign values to a new column name. For example, to add an 'Age Group' column based on age:

```python
df['Age Group'] = ['Young' if age < 30 else 'Old' for age in df['Age']]
```
This will create a new column 'Age Group' based on the age value.

In [8]:
# Handling missing data
df_missing = df.copy()
df_missing.loc[2, 'Age'] = np.nan
df_missing.fillna(df_missing.mean())

### 8. Handling Missing Data
Pandas provides tools to handle missing data. For example, you can fill missing values with the mean:

```python
df_missing.fillna(df_missing.mean())
```
This will replace missing values (NaN) with the column mean.

In [9]:
# Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Age': [24, 27, 22]})
merged_df = pd.merge(df1, df2, on='ID')
merged_df

### 9. Merging DataFrames
To merge DataFrames, use the `pd.merge()` function. For example, merging two DataFrames on the 'ID' column:

```python
merged_df = pd.merge(df1, df2, on='ID')
```
This will combine the two DataFrames based on the common 'ID' column.

In [10]:
# Saving the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

### 10. Saving DataFrame to CSV
You can save your DataFrame to a CSV file using the `to_csv()` function:

```python
df.to_csv('output.csv', index=False)
```
This will save the DataFrame to a CSV file named 'output.csv'.