# Linear Regression - Challenge: Pokémon Identification

## Overview

In this notebook, we will walk through a simple example of linear regression using a Pokémon dataset. No prior knowledge of linear regression is assumed. We will:
- Introduce the concept of linear regression
- Load and inspect the Pokémon data
- Visualize relationships between Pokémon features and legendary status
- Prepare the data and train a linear regression model
- Evaluate model performance and interpret the results

**Can we predict whether a Pokémon is legendary based on its characteristics?**

## Setting Up Our Data Tools

Before we start working with any data, we need to bring in some helpful tools. These tools let us:
  1. Load and organize information (like a digital spreadsheet).
  2. Create pictures and charts to see patterns.
  3. Build a simple model to make predictions.

In [None]:
# IMPORTS GO HERE

# Loading Our Data Sets

 Now that our tools are ready, we need to bring in the actual information
 we’ll work with. These files are in CSV format (think of them as simple
 text-based spreadsheets). We load two separate tables:
  1. train: the “training” data used to teach our prediction model.
  2. test: the “testing” data used later to check how well our model learned.


In [None]:
# Read the training data file into a pandas “DataFrame” (a table-like object).
# INSERT HERE

# Read the testing data file into another DataFrame.
# INSERT HERE

# Visualizing Attack vs. Defense

Here we draw a scatter plot to explore how a Pokémon’s Attack stat relates to its Defense stat. Each dot represents one Pokémon, with its position showing Attack on the horizontal axis and Defense on the vertical axis. Coloring by Legendary status highlights whether legendary Pokémon cluster in different regions, and a bit of transparency makes overlapping points easier to see.

In [None]:
# INSERT SCATTERPLOT HERE

# Add a title and labels to make the chart understandable

# Display the completed plot on screen


# Preparing Data for the Prediction Model

 Before our model can learn to predict energy use, we need to turn all our information into numbers and line up the training and testing tables. We will:
   1. Convert the day names into separate yes/no columns.
   2. Ensure the test table has the same columns as the train table.
   3. Choose which columns (features) the model will use to learn.
   4. Separate those features (X) from the value we want to predict (y).


In [None]:
# 1) One-hot encode the two type columns
#    e.g. "Type 1_Fire" = 1 if the Pokémon’s primary type is Fire, else 0

# 2) Align test to train so they share the same columns

# 3) Choose features: core stats plus all the new type-dummy columns


# 4) Split into inputs (X) and target (y)
#    X = what the model sees (features)
#    y = what it should predict (“Total” stat)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

# display learned coefficients

# Evaluating Our Model’s Performance

 After training our model, we check how well it predicts energy use on new, unseen buildings. We will:
   1. Ask the model to make predictions.
   2. Calculate two simple scores:
      - R²: how much of the real-world ups-and-downs we captured.
      - MSE: how big our average prediction mistakes are.
   3. Print those scores for a quick summary.
   4. Draw a chart that lines up actual vs. predicted values to
      visually inspect where we’re doing well or off-target.

In [None]:
# 1) Ask the model to predict energy consumption on the test set

# 2) Measure performance with R² (closer to 1 is better) and Mean Squared Error (smaller is better)

# 3) Show the results in the console

# 4) Plot actual vs. predicted to see how predictions stack up

# Draw a diagonal red dashed line representing perfect predictions

# Add informative title and axis labels

# Display the plot

# Interpreting Model Predictions and Key Influencers

At this stage, we want to:
   1. Ask our trained model to predict energy use on the test data.
   2. See how accurate those predictions are with two simple scores.
   3. Find out which building factors push energy use up or down the most.
   4. Print everything in an easy-to-read format.

In [None]:
# 1) Predict the Total stat for each Pokémon in the test set

# 2) Measure prediction accuracy:
#    - R² score: fraction of variance in Total explained by the model (closer to 1 is better)
#    - Mean Squared Error: average squared difference between predicted and actual Total (smaller is better)

# 3) Examine model coefficients to see which features most influence the Total stat:
#    - Pair each feature with its learned coefficient
#    - Sort to find the top 5 features that increase Total
#    - Sort to find the top 5 features that decrease Total

# 4) Display the results:
#    - Print R² and MSE
#    - Show the five strongest positive and negative drivers in markdown tables

