# Legislation Content Data Manipulation
This notebook demonstrates how to manipulate a CSV file containing legislation content.

In [1]:
import pandas as pd

df = pd.read_csv('legislation_content.csv')

This code performs several data manipulation tasks on a pandas DataFrame. Here is a breakdown of each step:

### Rename Columns:
The first two lines rename columns in the DataFrame:

```python
df.rename(columns={'Section': 'Hierarchy'}, inplace=True)
df.rename(columns={'Paragraph': 'Content'}, inplace=True)
```

- 'Section' is renamed to 'Hierarchy'.
- 'Paragraph' is renamed to 'Content'.

### Handle Missing Values in the 'Title' Column:
The next two lines handle missing values in the 'Title' column:

```python
df['Title'] = df['Title'].replace('No Title', 'Unknown Title')
df['Title_Missing'] = df['Title'].apply(lambda x: 1 if x == 'Unknown Title' else 0)
```

- Replace occurrences of 'No Title' with 'Unknown Title'.
- Create a new column 'Title_Missing' that indicates whether the title was missing (1 if 'Unknown Title', otherwise 0).

### Handle Missing Values in Other Columns:
The following line fills missing values in all other columns with a placeholder:

```python
df.fillna('Unknown', inplace=True)
```

### Reorder Columns:
The final line reorders the columns to place 'Title', 'Hierarchy', 'Content', and 'Title_Missing' at the beginning:

```python
df = df[['Title', 'Hierarchy', 'Content', 'Title_Missing']]
```

This sequence of operations ensures that the DataFrame has consistent column names, handles missing values appropriately, and has a specific column order for easier analysis.

In [3]:
# Rename 'Section' to 'Hierarchy'
df.rename(columns={'Section': 'Hierarchy'}, inplace=True)

# Rename 'Paragraph' to 'Content'
df.rename(columns={'Paragraph': 'Content'}, inplace=True)

# Handle missing values in the 'Title' column
# Replace 'No Title' with a placeholder (e.g., 'Unknown Title')
df['Title'] = df['Title'].replace('No Title', 'Unknown Title')

# Create a new feature indicating missing titles
df['Title_Missing'] = df['Title'].apply(lambda x: 1 if x == 'Unknown Title' else 0)

# Handle missing values in other columns by filling with a placeholder (e.g., 'Unknown')
df.fillna('Unknown', inplace=True)

# Reorder the columns to place Title, Hierarchy, Content, and Title_Missing at the beginning
df = df[['Title', 'Hierarchy', 'Content', 'Title_Missing']]


In [5]:
# Save the modified DataFrame to a new CSV file
df.to_csv('legislation_content_renamed.csv', index=False)

# Display the DataFrame to verify the changes
print("Final DataFrame:")
print(df.head())

Final DataFrame:
           Title                Hierarchy  \
0  Unknown Title  /ukpga/2016/19/contents   
1  Unknown Title  /ukpga/2016/19/contents   
2  Unknown Title  /ukpga/2016/19/contents   
3  Unknown Title  /ukpga/2016/19/contents   
4  Unknown Title  /ukpga/2016/19/contents   

                                    Content  Title_Missing  
0                         Introductory Text              1  
1  PART 1 Labour market and illegal working              1  
2                   CHAPTER 1 Labour market              1  
3     Director of Labour Market Enforcement              1  
4   1.Director of Labour Market Enforcement              1  
