
# Simple Linear Regression with Scikit-Learn


## Table of Contents

- [Project Title](#project-title)
- [Description](#description)
- [Features](#features)
- [Dataset](#dataset)
- [Installation](#installation)
- [Usage](#usage)
- [Project Structure](#project-structure)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgements](#acknowledgements)

## Description

This project implements a **Simple Linear Regression** model using Python's `scikit-learn` library. The goal is to establish a relationship between an independent variable \( x \) and a dependent variable \( y \), allowing predictions of \( y \) based on new \( x \) values.

## Features

- **Data Loading:** Reads dataset from a CSV file.
- **Data Preprocessing:** Handles missing or malformed data.
- **Model Training:** Splits data into training and testing sets and fits a linear regression model.
- **Evaluation:** Calculates Mean Squared Error (MSE) to assess model performance.
- **Visualization:** (Optional) Plots the regression line against the data points for visual analysis.

## Dataset

The dataset consists of two columns:

- **x:** Independent variable.
- **y:** Dependent variable.



### Install Dependencies

Ensure you have [Python](https://www.python.org/downloads/) installed. Then, install the required Python libraries:

```bash
pip install
pandas
scikit-learn
matplotlib
```

*Alternatively, install directly:*

```bash
pip install pandas scikit-learn matplotlib
```

## Usage

### 1. Prepare Your Dataset

- Ensure your dataset is in CSV format with two columns: `x` and `y`.
- Example (`data.csv`):

```csv
x,y
24,21.54945196
50,47.46446305
15,17.21865634
38,36.58639803
87,87.28898389
...
```

- **Important:** Remove or correct any rows with missing or malformed data (e.g., rows where `x` or `y` is missing).

### 2. Update the File Path

In the Python script (`linear_regression.py`), update the file path to point to your dataset.

```python
# Replace 'your_dataset_path.csv' with the actual path to your CSV file
data = pd.read_csv('path/to/your/data.csv')
```

### 3. Run the Script

Execute the Python script to train the model and evaluate its performance.

```bash
python linear_regression.py
```

### 4. (Optional) View Visualization

If you've included visualization, the script will display a scatter plot with the regression line.

## Project Structure

```
simple-linear-regression/
│
├── data/
│   └── data.csv          # Your dataset
│
├── linear_regression.py  # Main Python script
│
├── requirements.txt      # Python dependencies
│
└── README.md             # Project documentation
```


## Results

After running the script, you should see:

- **Mean Squared Error (MSE):** Indicates the average squared difference between the actual and predicted values. A lower MSE signifies a better fit.
  
  ```
  Mean Squared Error: 0.1234
  ```

- **Model Coefficients:**
  
  ```
  Coefficient (slope): 1.0200
  Intercept: 0.5000
  ```



## Acknowledgements

- [Scikit-Learn Documentation](https://scikit-learn.org/stable/documentation.html)
- [Pandas Documentation](https://pandas.pydata.org/docs/)
- [Matplotlib Documentation](https://matplotlib.org/stable/contents.html)
---




In [6]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
# Replace 'your_dataset_path.csv' with the actual file path
data = pd.read_csv('/content/Project3_SLR.csv')

# Drop rows with missing values
data = data.dropna()

# Assuming the dataset has the last column as the target (dependent variable)
X = data.iloc[:, :-1]  # Features (independent variables)
y = data.iloc[:, -1]   # Target (dependent variable)

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6, random_state=42)

# Initialize the Linear Regression model
model = LinearRegression()

# Fit the model on the training data
model.fit(X_train, y_train)

# Predict on the test data
y_pred = model.predict(X_test)

# Evaluate the model using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Calculate R-squared value
r_squared = r2_score(y_test, y_pred)
print(f"R-squared: {r_squared}")

# Optional: Print the coefficients and intercept
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)


Mean Squared Error: 8.2681328898678
R-squared: 0.9905086602447534
Coefficients: [1.00211147]
Intercept: -0.20993423426160973
