# -------------- **Pandas Data Manipulation** --------------

                             "29 January 2024"   - Akanksha              

Pandas is a popular open-source data manipulation and analysis library for Python. It provides high-performance, easy-to-use data structures such as DataFrame and Series, along with tools for reading, writing, and manipulating data. In this detailed overview, we'll cover various aspects of Pandas data manipulation, including data structures, indexing, selection, filtering, grouping, merging, and more.

## 1. **Data Structures in Pandas:**

### a. **Series:**
   - A one-dimensional labeled array capable of holding any data type.
   - Created using `pd.Series(data, index)` where `data` can be a list, NumPy array, or dictionary.



In [4]:
import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 4, 5, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f'])

### b. **DataFrame:**
   - A two-dimensional labeled data structure with columns that can be of different data types.
   - Can be created from a dictionary, NumPy array, or other DataFrames.


In [5]:
# Creating a DataFrame
df = pd.DataFrame({
    'Column1': [1, 2, 3, 4],
    'Column2': ['A', 'B', 'C', 'D']
})


## 2. **Indexing and Selection:**

## a. **Selection by Label:**
   - Use `.loc[]` for label-based indexing.





In [6]:
# Selecting a row
df.loc[0]

# Selecting a specific element
df.loc[0, 'Column1']

1

### b. **Selection by Position:**
   - Use `.iloc[]` for positional indexing.





In [7]:
# Selecting a row
df.iloc[0]

# Selecting a specific element
df.iloc[0, 1]

'A'

## 3. **Data Manipulation:**

### a. **Filtering:**
   - Use boolean indexing to filter rows based on conditions.





In [8]:
# Filtering rows where Column1 is greater than 2
df[df['Column1'] > 2]

Unnamed: 0,Column1,Column2
2,3,C
3,4,D


### b. **Adding and Removing Columns:**
   - Use the `[]` operator to add a new column.





In [9]:
# Adding a new column
df['NewColumn'] = df['Column1'] * 2

# Removing a column
df.drop('NewColumn', axis=1, inplace=True)

### c. **Missing Data:**
   - Handle missing data using methods like `dropna()`, `fillna()`.
  




In [10]:
# Drop rows with missing values
df.dropna()

# Fill missing values with a specific value
df.fillna(value=0)

Unnamed: 0,Column1,Column2
0,1,A
1,2,B
2,3,C
3,4,D


## 4. **Grouping and Aggregation:**




In [11]:
# Grouping by a column and calculating mean
df.groupby('Column2').mean()

Unnamed: 0_level_0,Column1
Column2,Unnamed: 1_level_1
A,1.0
B,2.0
C,3.0
D,4.0


## 5. **Merging and Concatenating:**

### a. **Concatenation:**
   - Combine DataFrames along rows or columns.




In [13]:
df1 = pd.DataFrame({
    'Column1': [1, 2, 3, 4],
    'Column2': ['A', 'B', 'C', 'D']
})

df2 = pd.DataFrame({
    'Column1': [5, 6, 7, 8],
    'Column2': ['E', 'F', 'G', 'H']
})

# Concatenating along rows
pd.concat([df1, df2])

# Concatenating along columns
pd.concat([df1, df2], axis=1)

Unnamed: 0,Column1,Column2,Column1.1,Column2.1
0,1,A,5,E
1,2,B,6,F
2,3,C,7,G
3,4,D,8,H


### b. **Merging:**
   - Merge DataFrames based on a key column.




In [17]:
# Merging two DataFrames
pd.merge(df1, df2, on='Column1')

Unnamed: 0,Column1,Column2_x,Column2_y


## 6. **Reading and Writing Data:**

### a. **Reading Data:**
   - Pandas supports various file formats such as CSV, Excel, SQL, etc.

```python
# Reading CSV
df = pd.read_csv('file.csv')

# Reading Excel
df = pd.read_excel('file.xlsx')
```




### b. **Writing Data:**
   - Save DataFrames to various file formats.



In [None]:
# Writing to CSV
df.to_csv('output.csv', index=False)

# Writing to Excel
df.to_excel('output.xlsx', index=False)