**Linear regression**

Q1. Load the boston housing data into a dataframe from https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
and display the top 5 rows using head() method of the dataframe

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")
print(df.head())

Q2. Determine and print which single feature produces the best score for linear regression trained on a single feature
print out the corresponding score
print out the corresponding equation, use the feature name instead of x1 in the equation


In [None]:
best_score = -np.inf
best_feature = None
best_model = None

for feature in df.columns[:-1]:
    X = df[[feature]]
    y = df["medv"]
    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)
    score = r2_score(y, y_pred)
    if score > best_score:
        best_score = score
        best_feature = feature
        best_model = model
print(f"Best feature for linear regression: {best_feature}")
print(f"Corresponding R-squared score: {best_score}")
print(f"Corresponding equation: y = {best_model.coef_[0]} * {best_feature} + {best_model.intercept_}")

Q3. Plot the scatterplot of the best score feature and the corresponding line obtained through regression on the same graph. make the color of the line 'red'

In [None]:
plt.scatter(df[best_feature], df["medv"], label="Data Points")
plt.plot(df[best_feature], best_model.predict(X), color='red', label="Regression Line")
plt.xlabel(best_feature)
plt.ylabel("medv")
plt.legend()
plt.show()

Q4. Train a linear regression model using every available feature as an input (aside from the target feature) and output it's score

In [None]:
X_all = df.drop("medv", axis=1)
model_all = LinearRegression()
model_all.fit(X_all, y)
y_pred_all = model_all.predict(X_all)
score_all = r2_score(y, y_pred_all)
print(f"R-squared score using all features: {score_all}")

**Logistic regression**

Download the dataset from https://www.kaggle.com/datasets/amarsharma768/bmd-data and load into a dataframe

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
df = pd.read_csv("bmd.csv")

Call the head method to know what you're working with

In [None]:
print(df.head())

Use up-to 3 explanatory features of your choice and split the data into training and testing; use 20% for testing

In [None]:
features = ["weight_kg", "height_cm", "bmd"]
X = df[features]
y = df["fracture"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

fit the logistic regression model, save the predictions on a test set into a variable *y_pred*

In [None]:
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Use classification_report to find the f-1 score of your model <br>
Try to get f1-score above 0.85 (use different 3 features or mess with model parameters), use random seed so it is reproducible

In [None]:
report = classification_report(y_test, y_pred)
print(report)