In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
### Introduction to Pandas in Python

**Pandas** is a powerful open-source library in Python designed for data manipulation and analysis. Built on top of **NumPy**, it provides data structures and functions needed to clean, analyze, and visualize structured data efficiently. It is particularly well-suited for handling tabular data like spreadsheets, databases, or time-series data.

---

### Key Features of Pandas:
1. **Data Structures**:
   - **Series**: One-dimensional labeled array, like a column in a spreadsheet.
   - **DataFrame**: Two-dimensional labeled data structure, similar to a table in a database or Excel spreadsheet.

2. **Data Manipulation**:
   - Supports operations like filtering, grouping, merging, reshaping, and cleaning data.

3. **Handling Missing Data**:
   - Provides tools to handle missing values, such as filling or dropping them.

4. **File I/O**:
   - Reads and writes data from various formats like CSV, Excel, SQL, JSON, etc.

5. **Integration with Other Libraries**:
   - Works seamlessly with NumPy, Matplotlib, and other Python libraries.

---

### Installation:
To install Pandas, use the following command:
```bash
pip install pandas
```

---

### Basic Operations in Pandas:

#### 1. **Importing Pandas**
```python
import pandas as pd
```

#### 2. **Creating a Series**
```python
# Creating a Series from a list
s = pd.Series([10, 20, 30, 40])

# Creating a Series with custom index
s_custom = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

print(s)
# Output:
# 0    10
# 1    20
# 2    30
# 3    40
# dtype: int64
```

#### 3. **Creating a DataFrame**
```python
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Score': [85, 90, 95]
}
df = pd.DataFrame(data)

print(df)
# Output:
#       Name  Age  Score
# 0    Alice   25     85
# 1      Bob   30     90
# 2  Charlie   35     95
```

#### 4. **Reading and Writing Files**
```python
# Reading from a CSV file
df = pd.read_csv('data.csv')

# Writing to a CSV file
df.to_csv('output.csv', index=False)
```

#### 5. **Exploring Data**
```python
print(df.head())  # Displays the first 5 rows
print(df.tail())  # Displays the last 5 rows
print(df.info())  # Summary of the DataFrame
print(df.describe())  # Statistical summary
```

#### 6. **Selecting and Filtering Data**
```python
# Selecting a column
print(df['Name'])

# Filtering rows
filtered_df = df[df['Age'] > 30]
print(filtered_df)
```

#### 7. **Adding and Dropping Columns**
```python
# Adding a new column
df['Passed'] = df['Score'] > 80

# Dropping a column
df = df.drop(columns=['Passed'])
```

#### 8. **Handling Missing Data**
```python
# Filling missing values
df['Score'].fillna(0, inplace=True)

# Dropping rows with missing values
df.dropna(inplace=True)
```

#### 9. **Grouping and Aggregation**
```python
grouped = df.groupby('Age')['Score'].mean()
print(grouped)
```

---

### Why Use Pandas?
- **Ease of Use**: Simplifies working with structured data.
- **Versatility**: Handles a wide variety of data sources and formats.
- **Performance**: Optimized for performance with large datasets.
- **Integration**: Works seamlessly with other Python libraries.

---

### Example: Analyzing Data in Pandas
```python
# Sample data
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Population': [8419000, 3980000, 2716000, 2328000, 1690000],
    'Area': [783.8, 503, 589, 637.5, 517]
}

# Creating a DataFrame
df = pd.DataFrame(data)

# Adding a new column for population density
df['Density'] = df['Population'] / df['Area']

print(df)
# Output:
#           City  Population   Area      Density
# 0    New York     8419000  783.8  10745.084075
# 1  Los Angeles     3980000  503.0   7912.728754
# 2      Chicago     2716000  589.0   4611.038961
# 3      Houston     2328000  637.5   3651.764706
# 4      Phoenix     1690000  517.0   3268.078626
```



