# Level 2: Core Data Structures

In this level, we dive into the two fundamental data structures in Pandas: the **Series** and the **DataFrame**.

In [1]:
import pandas as pd
import numpy as np

## 2.1 Series

A **Series** is a one-dimensional labeled array, similar to a column in a spreadsheet or a single vector of data. It can hold any data type (integers, strings, floats, Python objects, etc.).

### Creating a Series

**From a list:**

In [2]:
s_from_list = pd.Series([10, 20, 30, 40, 50])
print(s_from_list)

0    10
1    20
2    30
3    40
4    50
dtype: int64


**From a dictionary:**
The keys of the dictionary are used as the index.

In [3]:
s_from_dict = pd.Series({'a': 1, 'b': 2, 'c': 3})
print(s_from_dict)

a    1
b    2
c    3
dtype: int64


### Indexing and Slicing

In [4]:
# Accessing by position
print(f"Element at position 0: {s_from_list[0]}")

# Slicing
print("\nSlicing from position 1 to 3:")
print(s_from_list[1:4])

# Accessing by label
print(f"\nElement with label 'b': {s_from_dict['b']}")

Element at position 0: 10

Slicing from position 1 to 3:
1    20
2    30
3    40
dtype: int64

Element with label 'b': 2


### Basic Operations

In [5]:
# Math operations (vectorized)
s_math = pd.Series([1, 2, 3, 4, 5])
print("Original Series:")
print(s_math)

print("\nSeries * 2:")
print(s_math * 2)

# Filtering
print("\nElements greater than 3:")
print(s_math[s_math > 3])

Original Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64

Series * 2:
0     2
1     4
2     6
3     8
4    10
dtype: int64

Elements greater than 3:
3    4
4    5
dtype: int64


## 2.2 DataFrame

A **DataFrame** is a 2D labeled table, the most commonly used Pandas object. It's like an Excel spreadsheet or a SQL table.

### Creating a DataFrame

**From a dictionary of lists:**
This is the most common way. Each dictionary key becomes a column name, and the list becomes the column's data.

In [6]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df_from_dict = pd.DataFrame(data)
print(df_from_dict)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


**From a list of lists:**
You need to specify the column names separately.

In [7]:
data_list = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]
columns = ['Name', 'Age', 'City']
df_from_list = pd.DataFrame(data_list, columns=columns)
print(df_from_list)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


**From a NumPy array:**

In [8]:
data_np = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
df_from_np = pd.DataFrame(data_np, columns=['A', 'B', 'C'])
print(df_from_np)

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


**From external files (e.g., CSV):**
This is covered in detail in Level 3, but here's a quick preview. We first need to create a dummy CSV file.

In [9]:
df_from_dict.to_csv('sample_data.csv', index=False)
df_from_csv = pd.read_csv('sample_data.csv')
print("DataFrame read from CSV:")
print(df_from_csv)

DataFrame read from CSV:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


### Understanding Rows, Columns, and Index
- **Rows:** Horizontal entries.
- **Columns:** Vertical entries (each column is a Series).
- **Index:** The labels for the rows. By default, it's a range of integers (0, 1, 2, ...).

## 2.3 Basic Attributes

These attributes provide quick information about the DataFrame.

In [10]:
df = df_from_dict
print("Original DataFrame:")
print(df)

Original DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [11]:
print(f"Shape (rows, columns): {df.shape}")
print(f"Size (total elements): {df.size}")
print(f"Number of dimensions: {df.ndim}")

Shape (rows, columns): (3, 3)
Size (total elements): 9
Number of dimensions: 2


In [12]:
print("Data types of each column:")
print(df.dtypes)

Data types of each column:
Name    object
Age      int64
City    object
dtype: object


In [13]:
print(f"Columns: {df.columns}")
print(f"Index: {df.index}")

Columns: Index(['Name', 'Age', 'City'], dtype='object')
Index: RangeIndex(start=0, stop=3, step=1)


### `.info()` and `.describe()`

**.info():** Provides a concise summary of the DataFrame, including the index dtype and column dtypes, non-null values, and memory usage.

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes


**.describe():** Generates descriptive statistics for numerical columns (like count, mean, std, min, max, and quartiles).

In [15]:
df.describe()

Unnamed: 0,Age
count,3.0
mean,30.0
std,5.0
min,25.0
25%,27.5
50%,30.0
75%,32.5
max,35.0
