## 1 Series
#### 1.1 Creation
A Pandas Series is a one-dimensional array-like object that can hold various data types such as integers, floats, and strings. It is similar to a column in a table or a spreadsheet.

```python
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series

# Creating a Series with custom index
data = [1, 2, None, 4, 'c']
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
series

In [2]:
import pandas as pd

In [3]:
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [4]:
# Creating a Series with custom index
data = [1, 2, None, 4, 'c']
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
series

a       1
b       2
c    None
d       4
e       c
dtype: object

## 1.2 Attributes and Methods
Pandas Series comes with a variety of attributes and methods that make data manipulation straightforward. Here are some of the commonly used attributes and methods:

| **Method/Attribute** | **Description**                                      | **Example Code**                                             |
|----------------------|------------------------------------------------------|--------------------------------------------------------------|
| `series.sum()`       | Calculates the sum of the Series.                    | `series.sum()`                                               |
| `series.mean()`      | Calculates the mean of the Series.                   | `series.mean()`                                              |
| `series.max()`       | Finds the maximum value in the Series.               | `series.max()`                                               |
| `series.isnull()`    | Checks for null values in the Series, returns a Boolean Series. | `series_with_nan.isnull()`                                   |
| `series.notnull()`   | Checks for non-null values in the Series, returns a Boolean Series. | `series_with_nan.notnull()`                                  |
| `series.fillna()`    | Fills null values with the specified value.          | `series_with_nan.fillna(0)`                                  |
| `series.apply()`     | Applies a function to each element of the Series.    | `series.apply(lambda x: x ** 2)`                             |
| `series.count()`     | Counts the non-null elements in the Series.          | `series.count()`                                             |
| `series.std()`       | Calculates the standard deviation of the Series.     | `series.std()`                                               |
| `series.median()`    | Calculates the median of the Series.                 | `series.median()`                                            |
| `series.quantile()`  | Calculates the specified quantile of the Series.     | `series.quantile(0.25)`                                      |
| `series.value_counts()` | Counts the occurrences of each value in the Series.  | `series.value_counts()`                                      |
| `series.sort_values()` | Sorts the Series by its values.                      | `series.sort_values()`                                       |
| `series.rank()`         | Ranks the values in the Series.                     | `series.rank()`                                              |
| `series.cumsum()`       | Calculates the cumulative sum of the Series.        | `series.cumsum()`                                            |
| `series.shift()`        | Shifts the values in the Series by the specified number of periods. | `series.shift(1)`                                            |
| `np.log(series)`        | Applies the NumPy log function to the Series.       | `np.log(series)`                                             |
| `series.reindex()`      | Reindexes the Series with the specified index, filling missing values with the specified value. | `series.reindex(['a', 'b', 'c', 'd', 'e', 'f'], fill_value=0)` |


# 2. DataFrame
## 2.1 Creation
A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
```python
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['foo', 'bar', 'baz', 'qux', 'quux']
}
df = pd.DataFrame(data)
print("DataFrame from dictionary:\n", df)

# Creating a DataFrame from a list of dictionaries
data = [
    {'A': 1, 'B': 10, 'C': 'foo'},
    {'A': 2, 'B': 20, 'C': 'bar'},
    {'A': 3, 'B': 30, 'C': 'baz'}
]
df = pd.DataFrame(data)
print("DataFrame from list of dictionaries:\n", df)

# Creating a DataFrame from a list of lists
data = [
    [1, 10, 'foo'],
    [2, 20, 'bar'],
    [3, 30, 'baz']
]
columns = ['A', 'B', 'C']
df = pd.DataFrame(data, columns=columns)
print("DataFrame from list of lists:\n", df)

In [7]:
# Creating a DataFrame from a dictionary
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['foo', 'bar', 'baz', 'qux', 'quux']
}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1,10,foo
1,2,20,bar
2,3,30,baz
3,4,40,qux
4,5,50,quux


In [8]:
# Creating a DataFrame from a list of dictionaries
data = [
    {'A': 1, 'B': 10, 'C': 'foo'},
    {'A': 2, 'B': 20, 'C': 'bar'},
    {'A': 3, 'B': 30, 'C': 'baz'}
]
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1,10,foo
1,2,20,bar
2,3,30,baz


In [9]:
# Creating a DataFrame from a list of lists
data = [
    [1, 10, 'foo'],
    [2, 20, 'bar'],
    [3, 30, 'baz']
]
columns = ['A', 'B', 'C']
df = pd.DataFrame(data, columns=columns)
df

Unnamed: 0,A,B,C
0,1,10,foo
1,2,20,bar
2,3,30,baz


## 2.2. Attributes and Methods
Pandas DataFrame comes with a variety of attributes and methods that make data manipulation straightforward. Here are some of the commonly used attributes and methods:

| **Attribute/Method** | **Description**                                         | **Example Code**                                                |
|----------------------|---------------------------------------------------------|-----------------------------------------------------------------|
| `df.shape`           | Returns a tuple representing the dimensionality of the DataFrame. | `df.shape`                                                     |
| `df.columns`         | Returns the column labels of the DataFrame.             | `df.columns`                                                   |
| `df.index`           | Returns the row labels of the DataFrame.                | `df.index`                                                     |
| `df.dtypes`          | Returns the data types of each column in the DataFrame. | `df.dtypes`                                                    |
| `df.head()`          | Returns the first n rows of the DataFrame (default is 5). | `df.head()`                                                    |
| `df.tail()`          | Returns the last n rows of the DataFrame (default is 5). | `df.tail()`                                                    |
| `df.describe()`      | Generates descriptive statistics of numeric columns.    | `df.describe()`                                                |
| `df.sum()`           | Returns the sum of the values for each column.          | `df.sum()`                                                     |
| `df.mean()`          | Returns the mean of the values for each column.         | `df.mean()`                                                    |
| `df.max()`           | Returns the maximum value of each column.               | `df.max()`                                                     |
| `df['column_name']`  | Selects a column.                                        | `df['A']`                                                      |
| `df[['col1', 'col2']]` | Selects multiple columns.                               | `df[['A', 'B']]`                                               |
| `df[start:stop]`     | Selects rows by slicing (start is inclusive, stop is exclusive). | `df[0:3]`                                                      |
| `df.loc[]`           | Accesses a group of rows and columns by labels or a boolean array. | `df.loc[2:4, 'A']`                                             |
| `df.iloc[]`          | Accesses a group of rows and columns by integer position. | `df.iloc[2:5, 0]`                                              |
| `df[df['col'] > value]` | Selects rows where the column value meets a condition.    | `df[df['A'] > 2]`                                              |
| `df.drop()`          | Drops specified labels from rows or columns.            | `df.drop('D', axis=1)`                                         |
| `df.sort_values()`   | Sorts the DataFrame by the values of one or more columns. | `df.sort_values(by='B', ascending=False)`                      |
| `df.fillna()`        | Fills missing values with a specified value.            | `df.fillna(0)`                                                 |
| `df.apply()`         | Applies a function along an axis of the DataFrame.      | `df['A'].apply(lambda x: x ** 2)`                              |
| `df.cumsum()`        | Returns the cumulative sum over a DataFrame or Series.  | `df.cumsum()`                                                  |
| `df.reindex()`       | Conforms the DataFrame to a new index with optional filling logic. | `df.reindex(['a', 'b', 'c', 'd', 'e', 'f'], fill_value=0)`     |
| `df.value_counts()`  | Returns a Series containing counts of unique values.    | `df['A'].value_counts()`                                       |
| `df.std()`           | Returns the standard deviation of the values for each column. | `df.std()`                                                     |
| `df.median()`        | Returns the median of the values for each column.       | `df.median()`                                                  |
| `df.quantile()`      | Returns values at the given quantile over requested axis. | `df.quantile(0.25)`                                            |
| `df.rank()`          | Computes numerical data ranks (1 through n) along axis. | `df.rank()`                                                    |
| `df.shift()`         | Shifts the values in the DataFrame by the desired number of periods. | `df.shift(1)`                                                  |
| `np.log(df)`         | Applies the NumPy log function to the DataFrame.        | `np.log(df)`                                                   |


In [10]:
# Sample DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['foo', 'bar', 'baz', 'qux', 'quux']
}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1,10,foo
1,2,20,bar
2,3,30,baz
3,4,40,qux
4,5,50,quux


In [11]:
df.shape

(5, 3)

In [12]:
df.columns

Index(['A', 'B', 'C'], dtype='object')

In [13]:
df.index

RangeIndex(start=0, stop=5, step=1)

In [14]:
df.dtypes

A     int64
B     int64
C    object
dtype: object

In [15]:
# Basic statistics (only numeric columns)
print("DataFrame description:\n", df.describe(include=[int, float]))
print("Sum of each column:\n", df.sum(numeric_only=True))
print("Mean of each column:\n", df.mean(numeric_only=True))
print("Max of each column:\n", df.max(numeric_only=True))

DataFrame description:
               A          B
count  5.000000   5.000000
mean   3.000000  30.000000
std    1.581139  15.811388
min    1.000000  10.000000
25%    2.000000  20.000000
50%    3.000000  30.000000
75%    4.000000  40.000000
max    5.000000  50.000000
Sum of each column:
 A     15
B    150
dtype: int64
Mean of each column:
 A     3.0
B    30.0
dtype: float64
Max of each column:
 A     5
B    50
dtype: int64


In [17]:
# Conditional selection
df[df['A'] > 2]

Unnamed: 0,A,B,C,A_squared
2,3,30,baz,9
3,4,40,qux,16
4,5,50,quux,25


In [18]:
df.loc[2:4, ['A', 'B']]

Unnamed: 0,A,B
2,3,30
3,4,40
4,5,50


In [19]:
# Adding new columns
df['D'] = df['A'] + df['B']
df

Unnamed: 0,A,B,C,A_squared,D
0,1,10,foo,1,11
1,2,20,bar,4,22
2,3,30,baz,9,33
3,4,40,qux,16,44
4,5,50,quux,25,55


In [20]:
# Dropping columns
df.drop('D', axis=1)

Unnamed: 0,A,B,C,A_squared
0,1,10,foo,1
1,2,20,bar,4
2,3,30,baz,9
3,4,40,qux,16
4,5,50,quux,25


In [21]:
# Sorting
df.sort_values(by='B', ascending=False)

Unnamed: 0,A,B,C,A_squared,D
4,5,50,quux,25,55
3,4,40,qux,16,44
2,3,30,baz,9,33
1,2,20,bar,4,22
0,1,10,foo,1,11


In [22]:
# Handling missing values
df_with_nan = df.copy()
df_with_nan.loc[1, 'A'] = None
df_with_nan.fillna(0)

Unnamed: 0,A,B,C,A_squared,D
0,1.0,10,foo,1,11
1,0.0,20,bar,4,22
2,3.0,30,baz,9,33
3,4.0,40,qux,16,44
4,5.0,50,quux,25,55


In [16]:
# Applying functions
df['A_squared'] = df['A'].apply(lambda x: x ** 2)
df

Unnamed: 0,A,B,C,A_squared
0,1,10,foo,1
1,2,20,bar,4
2,3,30,baz,9
3,4,40,qux,16
4,5,50,quux,25


In [23]:
import pandas as pd

# Example data
data = {
    'date_str': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
    'value': [10, 20, 30, 40]
}

df = pd.DataFrame(data)

# Convert 'date_str' column to datetime
df['date'] = pd.to_datetime(df['date_str'])

print(df)


     date_str  value       date
0  2023-01-01     10 2023-01-01
1  2023-01-02     20 2023-01-02
2  2023-01-03     30 2023-01-03
3  2023-01-04     40 2023-01-04


In [24]:
import pandas as pd

# Example data
data = {
    'date_str': ['2023-01-01 10:00:00', '2023-01-02 11:00:00', '2023-01-03 12:00:00', '2023-01-04 13:00:00'],
    'value': [10, 20, 30, 40]
}

df = pd.DataFrame(data)

# Convert 'date_str' column to datetime
df['date'] = pd.to_datetime(df['date_str'])

# Extract components
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['hour'] = df['date'].dt.hour
df['minute'] = df['date'].dt.minute
df['second'] = df['date'].dt.second

print(df)


              date_str  value                date  year  month  day  hour  \
0  2023-01-01 10:00:00     10 2023-01-01 10:00:00  2023      1    1    10   
1  2023-01-02 11:00:00     20 2023-01-02 11:00:00  2023      1    2    11   
2  2023-01-03 12:00:00     30 2023-01-03 12:00:00  2023      1    3    12   
3  2023-01-04 13:00:00     40 2023-01-04 13:00:00  2023      1    4    13   

   minute  second  
0       0       0  
1       0       0  
2       0       0  
3       0       0  
