Popular(i,u)=α×Current\_Popular(i)+β×Trend(i,t)+ϵ×Similarity(u,others)

In [None]:
from sklearn.linear_model import LinearRegression
# Popular-评分-预测
# Current_Popular - 目前电影评分，
# Trend - 通过时间+评分计算
# Similarity - 协同过滤的预测值

# 假设 data 是你的数据集，包含了 P, X, Y, Z
X = data[['X', 'Y', 'Z']]  # 特征矩阵
y = data['P']  # 目标值

# 创建线性回归模型
model = LinearRegression()

# 拟合模型
model.fit(X, y)

# 输出权重因子 A, B, C
A = model.coef_[0]
B = model.coef_[1]
C = model.coef_[2]

print(f"A = {A}, B = {B}, C = {C}")


线性回归的一个限制是它只能捕捉线性关系。在许多现实世界的问题中，因变量和自变量的关系可能是非线性的。通过创建自变量的多项式特征，我们可以使用PolynomialFeatures类将非线性关系转化为线性形式。

In [None]:
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=3)
X_poly = poly_features.fit_transform(X_train)

# 使用多项式特征重新训练模型
model_poly = LinearRegression()
model_poly.fit(X_poly, y_train)

# 预测并评估
y_pred_poly = model_poly.predict(poly_features.transform(X_test))
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f"Mean Squared Error with Polynomials: {mse_poly}")


正则化是一种防止过拟合的技术，通过在损失函数中添加一个惩罚项来限制模型的复杂度。L1正则化（Lasso）和L2正则化（Ridge）是两种常见的方法。在Scikit-Learn中，可以使用Lasso或Ridge类实现：

In [None]:
from sklearn.linear_model import Lasso, Ridge

# 使用Lasso正则化
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
y_pred_lasso = lasso_model.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
print(f"Mean Squared Error with Lasso: {mse_lasso}")

# 使用Ridge正则化
ridge_model = Ridge(alpha=0.1)
ridge_model.fit(X_train, y_train)
y_pred_ridge = ridge_model.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f"Mean Squared Error with Ridge: {mse_ridge}")


特征选择

在具有大量特征的数据集中，特征选择可以帮助减少模型复杂度，提高模型的解释性。可以使用SelectKBest类结合一个统计测试（如f_regression）来选择最相关的特征：

In [None]:
from sklearn.feature_selection import SelectKBest, f_regression

# 选择最重要的k个特征
selector = SelectKBest(score_func=f_regression, k=2)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# 使用选定的特征训练和评估模型
model_kbest = LinearRegression()
model_kbest.fit(X_train_selected, y_train)
y_pred_kbest = model_kbest.predict(X_test_selected)
mse_kbest = mean_squared_error(y_test, y_pred_kbest)
print(f"Mean Squared Error with KBest Features: {mse_kbest}")


超参数调优

使用网格搜索或随机搜索来找到最优的模型参数。GridSearchCV和RandomizedSearchCV可以帮助自动化这个过程：



In [None]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# 对Ridge模型进行参数调优
ridge_params = {'alpha': [0.1, 0.5, 1.0, 5.0, 10.0]}
ridge_search = GridSearchCV(Ridge(), ridge_params, scoring='neg_mean_squared_error', cv=5)
ridge_search.fit(X_train, y_train)
best_ridge = ridge_search.best_estimator_
y_pred_tuned = best_ridge.predict(X_test)
mse_tuned = mean_squared_error(y_test, y_pred_tuned)
print(f"Mean Squared Error with Tuned Ridge: {mse_tuned}")
