# pandas portfolio part 5
In this part, what is going to be discussed is simply Data types and what different types are.

In pandas, data types refer to the kind of data that each element in a DataFrame or Series holds. Understanding these data types is crucial because it helps in optimizing performance, performing operations correctly, and ensuring that data is handled appropriately.

| Data Type        | Description                                                                 |
|------------------|-----------------------------------------------------------------------------|
| **`int64`**      | 64-bit integer, used for whole numbers                                      |
| **`float64`**    | 64-bit floating point, used for decimal numbers                             |
| **`object`**     | Typically used for strings or mixed data types                              |
| **`bool`**       | Boolean values (`True`/`False`)                                             |
| **`datetime64[ns]`** | Date and time values                                                    |
| **`timedelta[ns]`** | Differences between two dates or times                                   |
| **`category`**   | Categorical data, used for text or repeated values with a limited number of unique values (efficient storage) |


In [7]:
import pandas as pd
data = {
    'A': [1, 2, 3],
    'B': [4.0, 5.5, 6.1],
    'C': ['a', 'b', 'c'],
    'D': [True, False, True]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C,D
0,1,4.0,a,True
1,2,5.5,b,False
2,3,6.1,c,True


### how to check data types?
1. When checking a DataFrame's data types `(df.dtypes)`: It returns the data types of all columns, so each column is checked individually.
2. Checking a specific column `(df['A'].dtype)`: This returns the data type of that particular column.

In [12]:
df.dtypes

A      int64
B    float64
C     object
D       bool
dtype: object

In [13]:
df['A'].dtype

dtype('int64')

In [14]:
df.index.dtype

dtype('int64')

### conversion of datatypes?
It is possible to convert data types, if it makes sence. By using `astype` keyword:

In [21]:
df['A'] = df['A'].astype('float64')
df['A'].dtype

dtype('float64')

In [22]:
df.dtypes

A    float64
B    float64
C     object
D       bool
dtype: object

---

## missing data in pandas
In pandas, missing data is represented using NaN (Not a Number). NaN is a special floating-point value recognized by numpy, and it is used to denote missing or undefined data.
#### Functions to Detect Missing Data: isnull() and notnull()
1. `isnull()`: This function is used to detect missing values. It returns a boolean DataFrame or Series of the same shape as the original, where each entry is True if the corresponding value is NaN, and False otherwise.

2. `notnull()`: This function is the inverse of isnull(). It returns True for entries that are not NaN, and False for those that are.

Checking Specific Elements:
These functions can be applied to:

- A Single Entry: Check if a specific entry is NaN.
- A Column: Check if elements in a column are NaN.
- A DataFrame: Check the entire DataFrame for NaN values.

(note: keep in mind that rows are not part of that above! it is only entire dataframe, or a cloumn or a single entry; not a row ever!)

In [24]:
import numpy as np

data = {
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4],
    'C': ['foo', 'bar', 'baz', None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,,foo
1,2.0,2.0,bar
2,,3.0,baz
3,4.0,4.0,


In [25]:
df.isnull()

Unnamed: 0,A,B,C
0,False,True,False
1,False,False,False
2,True,False,False
3,False,False,True


In [26]:
df.notnull()

Unnamed: 0,A,B,C
0,True,False,True
1,True,True,True
2,False,True,True
3,True,True,False


In [27]:
df['A'].isnull()

0    False
1    False
2     True
3    False
Name: A, dtype: bool

In [28]:
pd.isnull(df.at[0, 'A'])

False

Filling Missing Data: `fillna()`
The fillna() function is used to fill NaN values with a specified value. This function can operate on:

- A Single Entry: If applied to a Series or a specific DataFrame column, it fills NaN values within that column.
- A Whole Column or DataFrame: If applied to a DataFrame, it fills NaN values in the entire DataFrame.

In [31]:
df['A'] = df['A'].fillna(0) # this one fills all NaN values in column 'A' with 0:
df

Unnamed: 0,A,B,C
0,1.0,,foo
1,2.0,2.0,bar
2,0.0,3.0,baz
3,4.0,4.0,


In [32]:
df_filled = df.fillna(-1) # Fills all NaN values in the entire DataFrame with -1:
df_filled

Unnamed: 0,A,B,C
0,1.0,-1.0,foo
1,2.0,2.0,bar
2,0.0,3.0,baz
3,4.0,4.0,-1


---

### renameing in pandas
The `rename()` function in pandas is a powerful tool for renaming the labels of rows or columns in a DataFrame. It's particularly useful when you want to change the names of one or more columns, index labels, or both, without altering the underlying data.

1. Renaming Columns Using a Dictionary:

In [33]:
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}

df = pd.DataFrame(data)

# Renaming column 'A' to 'Alpha' and 'B' to 'Beta'
df_renamed = df.rename(columns={'A': 'Alpha', 'B': 'Beta'})
df_renamed

Unnamed: 0,Alpha,Beta
0,1,4
1,2,5
2,3,6


2. Renaming Index Labels Using a Dictionary:

In [34]:
df_renamed = df.rename(index={0: 'first', 1: 'second', 2: 'third'})
df_renamed

Unnamed: 0,A,B
first,1,4
second,2,5
third,3,6


It is good to mention that `inplace` is avilible in renaming:

In [35]:
df.rename(columns={'A': 'Alpha', 'B': 'Beta'}, inplace=True)
df

Unnamed: 0,Alpha,Beta
0,1,4
1,2,5
2,3,6
