# AI-generated Code for Supervised Learning

## Check List for Prompt

1. Analytic method: logistic regression as a supervised learning for prediction
2. Variable names: specifying four IVs and one DV
3. (if needed) pre-processing: converting values of Sentiment (positive -> 1, negative -> -1)
4. Data file name (or path): a csv data file
5. Requesting the Python code for:
    1. accuracy calculation
    2. the predicted value of a new observation (with length of 200, rating of 5, negative sentiment, and 5 helpful votes)


## Prompt template:

>I want to run **logistic regression** as supervised learning for prediction, using Rating, Length, Sentiment, and Votes as independent variables and SalesUp as a dependent variable. The Sentiment variable has values of "positive" and "negative", which need to be converted to +1 and -1, respectively. All those variables are in my data, **xxxx.csv**. (1) Can you help me generate Python code for the accuracy calculation of this predictive modeling? (2) Also, I want to know the predicted value of SalesUp when a new observation has a Rating of 5, a Length of  200, negative sentiment, and 5 Votes.

This demo uses the same Amazon review dataset in Week 5 and you downloaded it from [here](https://github.com/YuxiaoLuo/AI_Intro/blob/main/data/week4_Thu_PythonAnalytics_review%20data%20example.csv).

## Result Table

- Run the Python code generated from AI and gather the result in to the table below.

| **ML Models**                | **Accuracy** | **Predicted Value** |
|------------------------------|-------------|----------------------|
| Logistic Regression          |             |                      |
| Support Vector Machine       |             |                      |
| Nearest Neighbor (K = 5)     |             |                      |
| Neural Network               |             |                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YuxiaoLuo/AI_Intro/blob/main/week7_supervised_learning.ipynb)

## Logistic Regression

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the data
data = pd.read_csv('data/week4_Thu_PythonAnalytics_review data example.csv')

# Convert Sentiment to numeric (+1 for positive, -1 for negative)
data['Sentiment'] = data['Sentiment'].map({'positive': 1, 'negative': -1})

# Define independent variables and dependent variable
X = data[['Rating', 'Length', 'Sentiment', 'Votes']]
y = data['SalesUp']

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on the test set and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

# Predict SalesUp for a new observation
new_observation = pd.DataFrame({
    'Rating': [5],
    'Length': [200],
    'Sentiment': [-1],  # negative sentiment
    'Votes': [5]
})

predicted_salesup = model.predict(new_observation)
print(f"Predicted SalesUp for new observation: {predicted_salesup[0]}")

## SVM Model

In [None]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the data
data = pd.read_csv('ReviewData.csv')

# Convert Sentiment to numeric (+1 for positive, -1 for negative)
data['Sentiment'] = data['Sentiment'].map({'positive': 1, 'negative': -1})

# Define independent variables and dependent variable
X = data[['Rating', 'Length', 'Sentiment', 'Votes']]
y = data['SalesUp']

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the SVM model
svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train, y_train)

# Predict on the test set and calculate accuracy
y_pred = svm_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Model Accuracy: {accuracy * 100:.2f}%")

# Predict SalesUp for a new observation
new_observation = pd.DataFrame({
    'Rating': [5],
    'Length': [200],
    'Sentiment': [-1],  # negative sentiment
    'Votes': [5]
})

predicted_salesup = svm_model.predict(new_observation)
print(f"Predicted SalesUp for new observation: {predicted_salesup[0]}")

## K Nearest Neighbors (KNN) Model (N = 5)

In [None]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the data
data = pd.read_csv('ReviewData.csv')

# Convert Sentiment to numeric (+1 for positive, -1 for negative)
data['Sentiment'] = data['Sentiment'].map({'positive': 1, 'negative': -1})

# Define independent variables and dependent variable
X = data[['Rating', 'Length', 'Sentiment', 'Votes']]
y = data['SalesUp']

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the KNN model with k=5
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

# Predict on the test set and calculate accuracy
y_pred = knn_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"KNN Model Accuracy (k=5): {accuracy * 100:.2f}%")

# Predict SalesUp for a new observation
new_observation = pd.DataFrame({
    'Rating': [5],
    'Length': [200],
    'Sentiment': [-1],  # negative sentiment
    'Votes': [5]
})

predicted_salesup = knn_model.predict(new_observation)
print(f"Predicted SalesUp for new observation: {predicted_salesup[0]}")

## Neural Network Model

In [None]:
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the data
data = pd.read_csv('ReviewData.csv')

# Convert Sentiment to numeric (+1 for positive, -1 for negative)
data['Sentiment'] = data['Sentiment'].map({'positive': 1, 'negative': -1})

# Define independent variables and dependent variable
X = data[['Rating', 'Length', 'Sentiment', 'Votes']]
y = data['SalesUp']

# Scale features for better neural network performance
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create and fit the Neural Network (MLPClassifier)
nn_model = MLPClassifier(hidden_layer_sizes=(100,), activation='relu', solver='adam', max_iter=500, random_state=42)
nn_model.fit(X_train, y_train)

# Predict on the test set and calculate accuracy
y_pred = nn_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Neural Network Model Accuracy: {accuracy * 100:.2f}%")

# Predict SalesUp for a new observation
new_observation = pd.DataFrame({
    'Rating': [5],
    'Length': [200],
    'Sentiment': [-1],  # negative sentiment
    'Votes': [5]
})

# Scale the new observation using the same scaler
new_observation_scaled = scaler.transform(new_observation)

predicted_salesup = nn_model.predict(new_observation_scaled)
print(f"Predicted SalesUp for new observation: {predicted_salesup[0]}")