In [None]:
#Run the following code to print multiple outputs from a cell
get_ipython().ast_node_interactivity = 'all'

## Connections 1: Evaluating AI-Generated Python Code for Data Profiling

The goal of this assignment is to help you critically evaluate Python code generated by a generative AI tool. You will be provided with a sample script that profiles a dataset using Python—but the code differs from the techniques taught in class. Your task is to:

* Identify differences between the AI-generated code and the methods covered in our course.
* Modify the code to align with the techniques and formatting we've practiced.
* Reflect on the risks of relying on generative AI tools for analytics without critical review.

### Part 1. Editing Code

Review the AI-generated code below. It profiles a dataset using Python but uses different syntax, structure, and logic than what we've covered:

```
import pandas as pd
import numpy as np

# Load and preview data
df = pd.read_csv("StrikeReportsPartial.csv")
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("First 3 rows:\n", df.head(3))

# Data types and nulls
print("\nData Types:\n", df.dtypes)
null_summary = pd.DataFrame({
    "Missing Count": df.isnull().sum(),
    "Missing %": (df.isnull().sum() / len(df)) * 100
})
print("\nMissing Data Summary:\n", null_summary.sort_values("Missing %", ascending=False))

# Summary statistics
numeric_cols = df.select_dtypes(include=np.number).columns
print("\nNumeric Summary:\n", df[numeric_cols].describe().T)

# Categorical summaries
cat_cols = df.select_dtypes(include="object").columns
for col in cat_cols:
    print(f"\n{col} Value Distribution:")
    print(df[col].value_counts(dropna=False).to_frame(name="Count"))

# Visualizations
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
df["SPEED"].dropna().plot(kind="hist", ax=axs[0], title="Speed Distribution")
df["DAMAGE"].value_counts().plot(kind="bar", ax=axs[1], title="Damage Types")
plt.tight_layout()
plt.show()
```

1. **Identify at least five differences between this code and the methods taught in class.**

[double-click here to type your answer]

a. 

b. 

c. 

d. 

e. 


2. **In the code cell below, rewrite the code using the techniques from our course (refer to the worksheet “03solution_Profiling.ipynb”).**

In [None]:
# edit the code below

import pandas as pd
import numpy as np

# Load and preview data
df = pd.read_csv("StrikeReportsPartial.csv")
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("First 3 rows:\n", df.head(3))

# Data types and nulls
print("\nData Types:\n", df.dtypes)
null_summary = pd.DataFrame({
    "Missing Count": df.isnull().sum(),
    "Missing %": (df.isnull().sum() / len(df)) * 100
})
print("\nMissing Data Summary:\n", null_summary.sort_values("Missing %", ascending=False))

# Summary statistics
numeric_cols = df.select_dtypes(include=np.number).columns
print("\nNumeric Summary:\n", df[numeric_cols].describe().T)

# Categorical summaries
cat_cols = df.select_dtypes(include="object").columns
for col in cat_cols:
    print(f"\n{col} Value Distribution:")
    print(df[col].value_counts(dropna=False).to_frame(name="Count"))

# Visualizations
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
df["SPEED"].dropna().plot(kind="hist", ax=axs[0], title="Speed Distribution")
df["DAMAGE"].value_counts().plot(kind="bar", ax=axs[1], title="Damage Types")
plt.tight_layout()
plt.show()

### Part 2. Reflection

3. **Write 3–5 sentences explaining one potential risk of using generative AI tools for coding without critical review.**

[double-click here to type your answer]
