# Basics of Pandas

**Pandas** is a powerful and popular data manipulation library in Python. It provides data structures like **Series** and **DataFrame** that are essential for data analysis tasks. Pandas is widely used for data cleaning, data transformation, data aggregation, and more. 

#### Key Data Structures:
1. **Series**: A one-dimensional labeled array capable of holding any data type (e.g., integers, strings, floats). A Series is similar to a column in a spreadsheet or a SQL table.
2. **DataFrame**: A two-dimensional labeled data structure with columns of potentially different types. It’s like a table in a database or an Excel spreadsheet.

### Example: Creating and Using a Pandas Series

```python
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)

print("Series:")
print(series)
```

**Output:**

```
Series:
0    10
1    20
2    30
3    40
4    50
dtype: int64
```

### Example: Creating and Using a Pandas DataFrame

```python
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

print("DataFrame:")
print(df)
```

**Output:**

```
DataFrame:
      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston
```

### Common Pandas Operations

#### 1. **Reading Data from a CSV File**

```python
df = pd.read_csv('data.csv')
```

This reads a CSV file into a DataFrame. You can then manipulate this data using various Pandas operations.

#### 2. **Selecting Columns**

```python
# Select the 'Name' column
names = df['Name']
print(names)
```

#### 3. **Filtering Data**

```python
# Filter rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)
```

#### 4. **Adding a New Column**

```python
# Add a new column 'Salary' to the DataFrame
df['Salary'] = [70000, 80000, 75000, 90000]
print(df)
```

#### 5. **Dropping a Column**

```python
# Drop the 'City' column
df = df.drop('City', axis=1)
print(df)
```

#### 6. **Descriptive Statistics**

```python
# Get basic statistics for the numerical columns
statistics = df.describe()
print(statistics)
```

#### 7. **Saving Data to a CSV File**

```python
df.to_csv('output.csv', index=False)
```

This saves the DataFrame to a CSV file.

### Summary

- **Pandas** is a crucial library for data manipulation in Python, offering robust data structures like **Series** and **DataFrame**.
- **Series**: A one-dimensional labeled array, similar to a column in a table.
- **DataFrame**: A two-dimensional labeled data structure, similar to a table in a database.
- Common operations include selecting data, filtering, adding/dropping columns, and performing descriptive statistics.

These basics provide a foundation for more advanced data analysis tasks using Pandas, making it an essential tool in data science and analytics.

In [1]:
# Example: Creating and Using a Pandas Series
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)

print("Series:")
print(series)

Series:
0    10
1    20
2    30
3    40
4    50
dtype: int64


In [2]:
# Example: Creating and Using a Pandas DataFrame
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

print("DataFrame:")
print(df)

DataFrame:
      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston


In [3]:
# Create dummy data and store in .csv format
import pandas as pd
import numpy as np

# Create dummy data
np.random.seed(0)
names = ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hank', 'Ivy', 'Jack']
cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'San Francisco', 'Seattle', 'Austin', 'Boston', 'Denver', 'Miami']

data = {
    'Name': np.random.choice(names, 50),
    'Age': np.random.randint(20, 60, size=50),
    'Salary': np.random.randint(50000, 150000, size=50),
    'City': np.random.choice(cities, 50)
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file
df.to_csv('dummy_data_50_rows.csv', index=False)

In [6]:
# Reading Data from a CSV File
df = pd.read_csv('data.csv')

# Select the 'Name' column
names = df['Name']
print(names)

# Filter rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)

# Add a new column 'Salary' to the DataFrame
df['Salary'] = [70000, 80000, 75000, 90000]
print(df)

# Drop the 'City' column
df = df.drop('City', axis=1)
print(df)

# Get basic statistics for the numerical columns
statistics = df.describe()
print(statistics)

df.to_csv('output.csv', index=False)

0    Frank
1    Alice
2    David
3     Hank
Name: Name, dtype: object
    Name  Age  Salary           City
1  Alice   56   74777  San Francisco
3   Hank   37   62843       New York
    Name  Age  Salary           City
0  Frank   20   70000       New York
1  Alice   56   80000  San Francisco
2  David   25   75000        Houston
3   Hank   37   90000       New York
    Name  Age  Salary
0  Frank   20   70000
1  Alice   56   80000
2  David   25   75000
3   Hank   37   90000
             Age        Salary
count   4.000000      4.000000
mean   34.500000  78750.000000
std    16.010413   8539.125638
min    20.000000  70000.000000
25%    23.750000  73750.000000
50%    31.000000  77500.000000
75%    41.750000  82500.000000
max    56.000000  90000.000000
