# Series and DataFrames in Pandas

This notebook covers the fundamental data structures in Pandas: Series (1-dimensional) and DataFrame (2-dimensional). You'll learn how to create, manipulate, and work with these core objects.

## Import Pandas

In [1]:
import pandas as pd
import numpy as np

## Pandas Series

A Series is a one-dimensional labeled array that can hold any data type.

In [2]:
# Creating Series from different sources

# From a list
s1 = pd.Series([1, 3, 5, 6, 8])
print('Series from list:')
print(s1)
print()

# From a list with custom index
s2 = pd.Series([1, 3, 5, 6, 8], index=['a', 'b', 'c', 'd', 'e'])
print('Series with custom index:')
print(s2)
print()

# From a NumPy array
arr = np.array([10, 20, 30, 40, 50])
s3 = pd.Series(arr, index=['A', 'B', 'C', 'D', 'E'])
print('Series from NumPy array:')
print(s3)
print()

# From a dictionary
data = {'Alice': 25, 'Bob': 30, 'Charlie': 35}
s4 = pd.Series(data)
print('Series from dictionary:')
print(s4)

Series from list:
0    1
1    3
2    5
3    6
4    8
dtype: int64

Series with custom index:
a    1
b    3
c    5
d    6
e    8
dtype: int64

Series from NumPy array:
A    10
B    20
C    30
D    40
E    50
dtype: int64

Series from dictionary:
Alice      25
Bob        30
Charlie    35
dtype: int64


In [3]:
# Series attributes and methods
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

print('Series:')
print(s)
print('Values:', s.values)
print('Index:', s.index)
print('Data type:', s.dtype)
print('Shape:', s.shape)
print('Size:', s.size)
print()

# Basic operations
print('Sum:', s.sum())
print('Mean:', s.mean())
print('Max:', s.max())
print('Min:', s.min())
print('Standard deviation:', s.std())

Series:
a    10
b    20
c    30
d    40
e    50
dtype: int64
Values: [10 20 30 40 50]
Index: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Data type: int64
Shape: (5,)
Size: 5

Sum: 150
Mean: 30.0
Max: 50
Min: 10
Standard deviation: 15.811388300841896


## Pandas DataFrame

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

In [4]:
# Creating DataFrames from different sources

# From a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'London', 'Paris', 'Tokyo'],
    'Salary': [50000, 60000, 70000, 80000]
}
df1 = pd.DataFrame(data)
print('DataFrame from dictionary:')
print(df1)
print()

# From a list of lists
data_list = [
    ['Alice', 25, 'New York', 50000],
    ['Bob', 30, 'London', 60000],
    ['Charlie', 35, 'Paris', 70000],
    ['David', 40, 'Tokyo', 80000]
]
df2 = pd.DataFrame(data_list, columns=['Name', 'Age', 'City', 'Salary'])
print('DataFrame from list of lists:')
print(df2)
print()

# From a NumPy array
arr = np.random.rand(4, 3)
df3 = pd.DataFrame(arr, columns=['A', 'B', 'C'], index=['Row1', 'Row2', 'Row3', 'Row4'])
print('DataFrame from NumPy array:')
print(df3)

DataFrame from dictionary:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Paris   70000
3    David   40     Tokyo   80000

DataFrame from list of lists:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Paris   70000
3    David   40     Tokyo   80000

DataFrame from NumPy array:
             A         B         C
Row1  0.532078  0.253789  0.594458
Row2  0.702915  0.840592  0.659638
Row3  0.851163  0.387950  0.094497
Row4  0.149373  0.132628  0.273282


In [5]:
# DataFrame attributes and methods
df = pd.DataFrame(data)

print('DataFrame:')
print(df)
print()
print('Shape:', df.shape)
print('Columns:', list(df.columns))
print('Index:', list(df.index))
print('Data types:')
print(df.dtypes)
print()

# Basic information
print('Info:')
print(df.info())
print()

# Statistical summary
print('Statistical summary:')
print(df.describe())

DataFrame:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Paris   70000
3    David   40     Tokyo   80000

Shape: (4, 4)
Columns: ['Name', 'Age', 'City', 'Salary']
Index: [0, 1, 2, 3]
Data types:
Name      object
Age        int64
City      object
Salary     int64
dtype: object

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   City    4 non-null      object
 3   Salary  4 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 260.0+ bytes
None

Statistical summary:
             Age        Salary
count   4.000000      4.000000
mean   32.500000  65000.000000
std     6.454972  12909.944487
min    25.000000  50000.000000
25%    28.750000  57500.000000
50%    32.500000  65000.000000
75%    36.250000  72500.0

## Summary

You have learned about Pandas' core data structures:
- **Series**: 1-dimensional labeled array
- **DataFrame**: 2-dimensional labeled data structure

Both Series and DataFrame provide powerful methods for data manipulation, indexing, and analysis. DataFrames are the most commonly used structure for data analysis tasks.