<a href="https://www.kaggle.com/code/rajathiagaraj/iris-datapipeline?scriptVersionId=260493363" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# C1

import pandas as pd

# Load the dataset with the correct file path.
# Assumed file path, as per our previous discussion.
df = pd.read_csv('/kaggle/input/iriscsv/Iris.csv')

# --- DEBUGGING STEP ---
# Print the columns to see the exact names in the DataFrame
print("Columns in the DataFrame:")
print(df.columns)
print("-" * 20)

# Once you have the correct name from the printout, use it here.
# For example, if the output shows 'species' with a capital 'S',
# you would change the code to:
# cleaned_df = df.drop(columns=['Species'])

# Based on the typical IRIS dataset, 'species' is the correct name,
# so if it's giving you an error, it's very likely the dataset you're
# using has a different column name.
# Let's assume the name is 'Species' (capital 'S')
try:
    cleaned_df = df.drop(columns=['species'])
except KeyError:
    # A more robust way to handle it if you're not sure of the case
    print("Could not find 'species' column. Trying a case-insensitive match...")
    # Get all column names as a list
    column_names = [col.lower() for col in df.columns]
    # Find the index of the 'species' column, case-insensitive
    if 'species' in column_names:
        correct_col_name = df.columns[column_names.index('species')]
        cleaned_df = df.drop(columns=[correct_col_name])
        print(f"Successfully dropped column '{correct_col_name}'.")
    else:
        print("Error: The column 'species' was not found in the dataset, even with a case-insensitive search.")
        # You might want to stop the code here or handle it differently
        # For now, let's re-raise the error to stop execution
        raise

# Save the new dataframe to a file
cleaned_df.to_csv('cleaned_iris.csv', index=False)

print('Cleaned dataset saved successfully!')

In [None]:
# C2
import pandas as pd

# Load a file from the read-only input directory
df = pd.read_csv('/kaggle/input/iriscsv/Iris.csv')

# ... do some data processing ...
# Let's say we just want to save the head of the dataframe
df_head = df.head()

# Save the new file to the writable working directory
output_file_path = '/kaggle/working/iris_head.csv'
df_head.to_csv(output_file_path, index=False)

print(f"File saved to: {output_file_path}")

### Summary of Error Rectification in a Kaggle Notebook

This notebook serves as a log for common errors encountered when working with datasets on Kaggle and the methods used to resolve them. The key to successful debugging is understanding the specifics of the Kaggle environment.

---

#### 1. `IsADirectoryError`

**The Error:** `IsADirectoryError: [Errno 21] Is a directory: '/kaggle/input/iris-flower-dataset'`

**The Problem:** This error occurs when a directory is treated as a file. The path `/kaggle/input/iris-flower-dataset` points to a folder, not a file itself. A DataFrame cannot be created from a folder.

**The Solution:** The file needs to be explicitly specified within the directory. By inspecting the dataset's contents (via the `Data` panel or the `!ls` command), we found the correct file name (`IRIS.csv`).

**Corrected Code:**
```python
import pandas as pd
df = pd.read_csv('/kaggle/input/iris-flower-dataset/IRIS.csv')

2. KeyError
The Error: KeyError: "['species'] not found in axis"

The Problem: This error means that a column with the exact name 'species' does not exist in the DataFrame. This is typically due to a case-sensitivity issue or a slight spelling difference.

The Solution: The best way to resolve this is to inspect the DataFrame's columns immediately after loading the data. The command df.columns reveals the exact column names. In this case, the column was capitalized as 'Species'.

Debugging Code:

Python

print("Columns in the DataFrame:")
print(df.columns)
Corrected Code (using the correct capitalization):

Python

cleaned_df = df.drop(columns=['Species'])

3. FileNotFoundError
The Error: FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/iris-flower-dataset/IRIS.csv'

The Problem: This is a common issue when a new notebook session starts. The environment resets, and the previously attached datasets are temporarily disconnected. The file path becomes invalid until the dataset is re-attached.

The Solution: The dataset must be re-added to the notebook via the Kaggle UI. In the "Data" panel on the right, click "+ Add data" and select the required dataset. This action remounts the files, making them accessible at the /kaggle/input/ path.

Key Takeaways and Best Practices
Know Your Directories: Always use /kaggle/input/ for reading datasets and /kaggle/working/ for writing output files. /kaggle/input/ is read-only, while /kaggle/working/ is read-write.

Inspect Your Data: Use df.columns to verify column names and !ls to check for files within a directory.

Session Management: Be aware that starting a new session requires re-attaching datasets. Running all cells (Run > Run All) is a good practice to ensure the environment is correctly set up from the start.

Committing: Committing the notebook with "Save & Run All" is crucial for saving your output and creating a reproducible record of your work.