#### 🧠 What is pandas?

**pandas** is a Python library used to:

* Load, inspect, and clean data
* Analyze and manipulate data tables
* Handle missing values
* Perform group operations and filtering

It is widely used in data science, especially for **exploratory data analysis (EDA)**.

---

#### 📘 What You’ll Learn in the Kaggle Pandas Course

#### 1. **Creating, Reading, and Writing Data**

* Load data from CSV files using `pd.read_csv()`
* View the data with `.head()` and `.info()`
* Save data using `.to_csv()`

```python
import pandas as pd
data = pd.read_csv("file.csv")
data.to_csv("new_file.csv")
```

---

#### 2. **DataFrame Basics**

* What is a **DataFrame**? (It’s like a table in Excel)
* Access columns: `data['column_name']`
* Summary statistics: `.describe()`, `.mean()`, `.unique()`

```python
data['age'].mean()
data['category'].unique()
```

---

#### 3. **Indexing, Selecting, and Assigning**

* Select rows using `.loc[]` and `.iloc[]`
* Filter data with conditions
* Add new columns

```python
# Select row by index
data.loc[0]

# Select row by position
data.iloc[3]

# Filter rows
data[data['age'] > 30]

# Add a new column
data['age_in_10_years'] = data['age'] + 10
```

---

#### 4. **Summary Functions and Maps**

* Use `.value_counts()` and `.map()` for analysis
* Apply functions to columns: `.apply()`

```python
data['income_bracket'] = data['income'].map(lambda x: 'High' if x > 50000 else 'Low')
```

---

#### 5. **Grouping and Sorting**

* Use `.groupby()` to group data and compute summaries
* Sort with `.sort_values()`

```python
data.groupby('city')['price'].mean()
data.sort_values('price', ascending=False)
```

---

#### 6. **Data Types and Missing Values**

* Check data types with `.dtypes`
* Convert types using `.astype()`
* Handle missing values with `.isnull()`, `.fillna()`, `.dropna()`

```python
data['age'].fillna(data['age'].mean(), inplace=True)
data.dropna(subset=['income'], inplace=True)
```

---

#### 7. **Renaming and Combining**

* Rename columns: `.rename()`
* Combine DataFrames: `pd.concat()`, `.merge()`

```python
data.rename(columns={'old_name': 'new_name'}, inplace=True)
combined = pd.concat([df1, df2])
```


#### Practice

In [1]:
import pandas as pd

help(pd.DataFrame().map)

Help on method map in module pandas.core.frame:

map(func: 'PythonFuncType', na_action: 'str | None' = None, **kwargs) -> 'DataFrame' method of pandas.core.frame.DataFrame instance
    Apply a function to a Dataframe elementwise.

    .. versionadded:: 2.1.0

       DataFrame.applymap was deprecated and renamed to DataFrame.map.

    This method applies a function that accepts and returns a scalar
    to every element of a DataFrame.

    Parameters
    ----------
    func : callable
        Python function, returns a single value from a single value.
    na_action : {None, 'ignore'}, default None
        If 'ignore', propagate NaN values, without passing them to func.
    **kwargs
        Additional keyword arguments to pass as keywords arguments to
        `func`.

    Returns
    -------
    DataFrame
        Transformed DataFrame.

    See Also
    --------
    DataFrame.apply : Apply a function along input axis of DataFrame.
    DataFrame.replace: Replace values given in `to_repla

In [8]:
my_dict = {"1": [1, 2], "2": ["1", "2"]}
df = pd.DataFrame(my_dict)
df.columns

Index(['1', '2'], dtype='object')

In [9]:
df["12"] = df["1"].map(lambda x: str(x))
df

Unnamed: 0,1,2,12
0,1,1,1
1,2,2,2


In [10]:
df["1"].value_counts()

1
1    1
2    1
Name: count, dtype: int64

In [11]:
df.rename(columns={"12": "3"}, inplace=True)
df

Unnamed: 0,1,2,3
0,1,1,1
1,2,2,2


In [12]:
df["3"] = df["3"].astype(int)

In [13]:
df.dtypes

1     int64
2    object
3     int64
dtype: object

In [15]:
df = df.astype(int)
df.dtypes

1    int64
2    int64
3    int64
dtype: object

In [None]:
my_dict_1 = {"1": [1, 2], "2": ["1", "2"]}
my_dict_2 = {"1": [1, 2], "2": ["1", "2"]}
new_df = pd.concat([pd.DataFrame(my_dict_1), pd.DataFrame(my_dict_2)])
new_df

Unnamed: 0,1,2
0,1,1
1,2,2
0,1,1
1,2,2


In [19]:
my_dict_1 = {"1": [1, 2], "no": ["1", "2"]}
my_dict_2 = {"1": [1, 2], "no": ["1", "2"]}
df_1 = pd.DataFrame(my_dict_1)
df_2 = pd.DataFrame(my_dict_2)

df_3 = df_2.merge(df_1, how="inner", on="no")
df_3

Unnamed: 0,1_x,no,1_y
0,1,1,1
1,2,2,2
