# Machine Learning Prediction and Analysis on EdX Courses Dataset

This notebook demonstrates dataset preparation, feature engineering, visualization, regression tasks (predicting course rating and price), classification (predicting course level), clustering analysis, comprehensive insights, and recommendations using an EdX courses dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import silhouette_score, davies_bouldin_score

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, RandomForestClassifier
from sklearn.svm import SVR, SVC
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.decomposition import PCA

import xgboost as XGBRegressor
from sklearn.feature_selection import SelectKBest, f_regression, RFE

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## 1. Dataset Preparation

Load the dataset from a CSV string and inspect its basic properties.

In [None]:
# Load and inspect the dataset
data = '''url,course_name,rating,survey_count,Institution,Level,Associated_skills,price,duration_days
https://www.edx.org/learn/architecture-history/massachusetts-institute-of-technology-a-global-history-of-architecture,mitx: a global history of architecture,0.8887569237884039,0.00022518911862987176,MITx,Introductory,architectural history,149.0,91.0
https://www.edx.org/learn/architecture/harvard-university-the-architectural-imagination,harvardx: the architectural imagination,0.666270771365209,0.004600291994867381,HarvardX,Introductory,"imagination, innovation, perspective (graphical), value systems",249.0,70.0
https://www.edx.org/learn/design/delft-university-of-technology-introduction-to-ai-in-architectural-design,delftx: ai in architectural design: introduction,0.46650830854608394,0.058757090348987476,DelftX,Introductory,"architectural design, artificial intelligence, computer vision, design thinking, enthusiasm, machine learning, open source technology, python (programming language)",109.0,56.0
https://www.edx.org/learn/architecture/the-university-of-tokyo-four-facets-of-contemporary-japanese-architecture-theory,utokyox: four facets of contemporary japanese architecture: theory,0.46650830854608394,0.058757090348987476,UTokyoX,Intermediate,advertisement,119.0,56.0
https://www.edx.org/learn/history/the-university-of-tokyo-tokyo-hillside-tokyo-riverside-exploring-the-historical-city,"utokyox: tokyo hillside, tokyo riverside: exploring the historical city",0.777513847576806,0.00012867949635992672,UTokyoX,Introductory,"lecturing, social development",59.0,42.0
https://www.edx.org/learn/nosql/ibm-nosql-database-basics,ibm: nosql database basics,0.33254154273041703,0.002123211689938791,IBM,Introductory,"agile methodology, apache cassandra, big data, cloudant, data as a service (daas), database management, database permissions, database as a service (dbaas), management, mongodb, nosql, relational databases, scalability",99.0,35.0
https://www.edx.org/learn/judaism/university-of-pennsylvania-the-tabernacle-in-word-image-an-italian-jewish-manuscript-revealed,pennx: the tabernacle in word & image: an italian jewish manuscript revealed,0.46650830854608394,0.058757090348987476,PennX,Advanced,hebrew language,29.0,35.0
https://www.edx.org/learn/architecture/tokyo-institute-of-technology-japanese-architecture-and-structural-design,tokyotechx: japanese architecture and structural design,0.44378461894201404,0.0008685866004295053,TokyoTechX,Intermediate,"aesthetics, environmentalism, metabolism, roofing, seismic analysis, seismic retrofit, seismology, shear (sheet metal), structural engineering",79.0,35.0'''

from io import StringIO
df = pd.read_csv(StringIO(data))

print("Dataset Shape:", df.shape)
print("\nData Types:")
print(df.dtypes)

print("\nSummary Statistics:")
print(df.describe())

print("\nMissing Values:")
print(df.isnull().sum())

## Feature Engineering

Enhance the dataset by creating new features such as domain, subject area, skill count, tech related flag, price tier, duration tier, price per day, and institution type. Then one-hot encode selected categorical variables and create a simple skills matrix.

In [None]:
# Extract domain from URL
df["domain"] = df["url"].apply(lambda x: x.split('/')[2])

# Extract subject area from URL
df["subject"] = df["url"].apply(lambda x: x.split('/')[4])

# Count skills for each course
df["skill_count"] = df["Associated_skills"].apply(lambda x: len(str(x).split(',')))

# Create feature for tech-related courses (contains programming, data, etc.)
tech_keywords = ['python', 'ai', 'artificial intelligence', 'data', 'programming', 'database', 'nosql']
df["is_tech"] = df["Associated_skills"].apply(
    lambda x: 1 if any(keyword in str(x).lower() for keyword in tech_keywords) else 0
)

# Create feature for price tier
df["price_tier"] = pd.cut(
    df["price"], 
    bins=[0, 50, 100, 150, 250], 
    labels=['Budget', 'Low', 'Medium', 'Premium']
)

# Create feature for duration tier
df["duration_tier"] = pd.cut(
    df["duration_days"], 
    bins=[0, 40, 60, 100], 
    labels=['Short', 'Medium', 'Long']
)

# Calculate price per day
df["price_per_day"] = df["price"] / df["duration_days"]

# Extract institution type
def get_institution_type(inst):
    if 'x' in inst.lower():
        return 'University'
    else:
        return 'Company'

df["institution_type"] = df["Institution"].apply(get_institution_type)

# One-hot encode categorical variables
categorical_cols = ['Institution', 'Level', 'subject', 'price_tier', 'duration_tier', 'institution_type']
for col in categorical_cols:
    dummies = pd.get_dummies(df[col], prefix=col)
    df = pd.concat([df, dummies], axis=1)

# Feature for skills - creating a skills matrix
skills_list = []
for skills in df['Associated_skills']:
    if isinstance(skills, str):
        skills_list.extend([skill.strip().lower() for skill in skills.split(',')])
unique_skills = list(set(skills_list))

# Creating a simple skills matrix (1 if skill present, 0 if not)
for skill in unique_skills:
    df[f'skill_{skill.replace(" ", "_")}'] = df['Associated_skills'].apply(
        lambda x: 1 if isinstance(x, str) and skill in x.lower() else 0
    )

print("\nEnhanced Dataset Shape:", df.shape)
print("\nNew Features Added:")
print(list(df.columns))

## Visualization of Key Features

Create scatter plots to visualize relationships between different features and the rating.

In [None]:
plt.figure(figsize=(15, 10))

# Plot 1: Price vs Rating
plt.subplot(2, 2, 1)
plt.scatter(df['price'], df['rating'], alpha=0.7)
plt.title('Price vs Rating')
plt.xlabel('Price ($)')
plt.ylabel('Rating')

# Plot 2: Duration vs Rating
plt.subplot(2, 2, 2)
plt.scatter(df['duration_days'], df['rating'], alpha=0.7)
plt.title('Duration vs Rating')
plt.xlabel('Duration (days)')
plt.ylabel('Rating')

# Plot 3: Price per Day vs Rating
plt.subplot(2, 2, 3)
plt.scatter(df['price_per_day'], df['rating'], alpha=0.7)
plt.title('Price per Day vs Rating')
plt.xlabel('Price per Day')
plt.ylabel('Rating')

# Plot 4: Skill Count vs Rating
plt.subplot(2, 2, 4)
plt.scatter(df['skill_count'], df['rating'], alpha=0.7)
plt.title('Skill Count vs Rating')
plt.xlabel('Number of Skills')
plt.ylabel('Rating')

plt.tight_layout()
plt.savefig('feature_relationships.png')
plt.show()

## 2. REGRESSION TASK: Predicting Course Rating

Split features and target for rating prediction, compare different regression models, and evaluate model performance.

In [None]:
# Split features and target for rating prediction
X_rating = df.drop(['url', 'course_name', 'rating', 'Associated_skills', 'Level', 'price_tier', 'duration_tier'], axis=1)
y_rating = df['rating']

# Simple train-test split
X_rating_train, X_rating_test, y_rating_train, y_rating_test = train_test_split(
    X_rating, y_rating, test_size=0.25, random_state=42
)

# Define regression models to compare
regression_models = {
    'Linear Regression': LinearRegression(),
    'Ridge Regression': Ridge(),
    'Lasso Regression': Lasso(),
    'ElasticNet': ElasticNet(),
    'Random Forest': RandomForestRegressor(random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(random_state=42),
    'SVR': SVR(),
    'KNN': KNeighborsRegressor()
}

# Train and evaluate each model
rating_results = {}

for name, model in regression_models.items():
    model.fit(X_rating_train, y_rating_train)
    y_pred = model.predict(X_rating_test)
    mse = mean_squared_error(y_rating_test, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_rating_test, y_pred)
    rating_results[name] = {
        'RMSE': rmse,
        'R2': r2,
        'Model': model
    }

print("\n=== REGRESSION TASK: Predicting Course Rating ===")
print("\nModel Performance Comparison:")
for name, metrics in rating_results.items():
    print(f"{name}: RMSE = {metrics['RMSE']:.4f}, R² = {metrics['R2']:.4f}")

# Identify best performing model for rating prediction
best_rating_model = min(rating_results.items(), key=lambda x: x[1]['RMSE'])
print(f"\nBest Model for Rating Prediction: {best_rating_model[0]}")
print(f"RMSE: {best_rating_model[1]['RMSE']:.4f}")
print(f"R²: {best_rating_model[1]['R2']:.4f}")

# Feature importance for Random Forest model (if available)
if 'Random Forest' in regression_models:
    rf_model = rating_results['Random Forest']['Model']
    feature_importance = pd.DataFrame({
        'Feature': X_rating.columns,
        'Importance': rf_model.feature_importances_
    })
    feature_importance = feature_importance.sort_values('Importance', ascending=False)
    
    print("\nTop 10 Features for Rating Prediction:")
    print(feature_importance.head(10))
    
    plt.figure(figsize=(10, 6))
    plt.barh(feature_importance['Feature'].head(10), feature_importance['Importance'].head(10))
    plt.xlabel('Importance')
    plt.title('Top 10 Features for Rating Prediction')
    plt.tight_layout()
    plt.savefig('rating_feature_importance.png')
    plt.show()

In [None]:
# Hyperparameter tuning for the best model (if Random Forest is best)
best_model_name = best_rating_model[0]

if best_model_name == 'Random Forest':
    param_grid = {
        'n_estimators': [50, 100, 200],
        'max_depth': [None, 10, 20, 30],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4]
    }
    grid_search = GridSearchCV(
        RandomForestRegressor(random_state=42),
        param_grid=param_grid,
        cv=3,
        scoring='neg_mean_squared_error',
        n_jobs=-1
    )
    grid_search.fit(X_rating_train, y_rating_train)
    
    print(f"\nBest Hyperparameters for {best_model_name}:")
    print(grid_search.best_params_)
    
    best_rf = grid_search.best_estimator_
    tuned_y_pred = best_rf.predict(X_rating_test)
    tuned_rmse = np.sqrt(mean_squared_error(y_rating_test, tuned_y_pred))
    tuned_r2 = r2_score(y_rating_test, tuned_y_pred)
    
    print(f"\nTuned Model Performance:")
    print(f"RMSE: {tuned_rmse:.4f}")
    print(f"R²: {tuned_r2:.4f}")

## 3. REGRESSION TASK: Predicting Course Price

Now we predict course price using selected regression models.

In [None]:
# Split features and target for price prediction
X_price = df.drop(['url', 'course_name', 'price', 'Associated_skills', 'price_tier', 'price_per_day'], axis=1)
y_price = df['price']

X_price_train, X_price_test, y_price_train, y_price_test = train_test_split(
    X_price, y_price, test_size=0.25, random_state=42
)

# Define regression models for price prediction
price_models = {
    'Linear Regression': LinearRegression(),
    'Ridge Regression': Ridge(),
    'Random Forest': RandomForestRegressor(random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(random_state=42)
}

price_results = {}

for name, model in price_models.items():
    model.fit(X_price_train, y_price_train)
    y_pred = model.predict(X_price_test)
    mse = mean_squared_error(y_price_test, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_price_test, y_pred)
    price_results[name] = {
        'RMSE': rmse,
        'R2': r2,
        'Model': model
    }

print("\n=== REGRESSION TASK: Predicting Course Price ===")
print("\nModel Performance Comparison:")
for name, metrics in price_results.items():
    print(f"{name}: RMSE = {metrics['RMSE']:.4f}, R² = {metrics['R2']:.4f}")

best_price_model = min(price_results.items(), key=lambda x: x[1]['RMSE'])
print(f"\nBest Model for Price Prediction: {best_price_model[0]}")
print(f"RMSE: {best_price_model[1]['RMSE']:.4f}")
print(f"R²: {best_price_model[1]['R2']:.4f}")

if best_price_model[0] == 'Random Forest':
    rf_price_model = best_price_model[1]['Model']
    price_importance = pd.DataFrame({
        'Feature': X_price.columns,
        'Importance': rf_price_model.feature_importances_
    })
    price_importance = price_importance.sort_values('Importance', ascending=False)
    
    print("\nTop 10 Features for Price Prediction:")
    print(price_importance.head(10))
    
    plt.figure(figsize=(10, 6))
    plt.barh(price_importance['Feature'].head(10), price_importance['Importance'].head(10))
    plt.xlabel('Importance')
    plt.title('Top 10 Features for Price Prediction')
    plt.tight_layout()
    plt.savefig('price_feature_importance.png')
    plt.show()

## 4. CLASSIFICATION TASK: Predicting Course Level

Prepare features and target for classification, build models and evaluate accuracy.

In [None]:
# Prepare features and target for classification
X_class = df.drop(['url', 'course_name', 'Level', 'Associated_skills', 'Level_Advanced', 'Level_Intermediate', 'Level_Introductory'], axis=1)
y_class = df['Level']

X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(
    X_class, y_class, test_size=0.25, random_state=42
)

# Define classification models
classification_models = {
    'Random Forest': RandomForestClassifier(random_state=42),
    'SVC': SVC(probability=True),
    'KNN': KNeighborsClassifier()
}

class_results = {}

for name, model in classification_models.items():
    model.fit(X_class_train, y_class_train)
    y_pred = model.predict(X_class_test)
    accuracy = accuracy_score(y_class_test, y_pred)
    class_results[name] = {
        'Accuracy': accuracy,
        'Model': model
    }

print("\n=== CLASSIFICATION TASK: Predicting Course Level ===")
print("\nModel Performance Comparison:")
for name, metrics in class_results.items():
    print(f"{name}: Accuracy = {metrics['Accuracy']:.4f}")

best_class_model = max(class_results.items(), key=lambda x: x[1]['Accuracy'])
print(f"\nBest Model for Level Classification: {best_class_model[0]}")
print(f"Accuracy: {best_class_model[1]['Accuracy']:.4f}")

y_pred = best_class_model[1]['Model'].predict(X_class_test)
print("\nClassification Report:")
print(classification_report(y_class_test, y_pred))

# Feature importance for Random Forest Classifier (if applicable)
if 'Random Forest' in classification_models:
    rf_class_model = class_results['Random Forest']['Model']
    class_importance = pd.DataFrame({
        'Feature': X_class.columns,
        'Importance': rf_class_model.feature_importances_
    })
    class_importance = class_importance.sort_values('Importance', ascending=False)
    
    print("\nTop 10 Features for Level Classification:")
    print(class_importance.head(10))
    
    plt.figure(figsize=(10, 6))
    plt.barh(class_importance['Feature'].head(10), class_importance['Importance'].head(10))
    plt.xlabel('Importance')
    plt.title('Top 10 Features for Level Classification')
    plt.tight_layout()
    plt.savefig('level_feature_importance.png')
    plt.show()

## 5. CLUSTERING TASK: Grouping Similar Courses

Perform clustering on selected features, evaluate cluster performance, visualize clusters using PCA, and analyze cluster characteristics.

In [None]:
# Prepare data for clustering
cluster_features = ['rating', 'price', 'duration_days', 'survey_count', 'skill_count', 'price_per_day', 'is_tech']
X_cluster = df[cluster_features]

# Scale the data
scaler = StandardScaler()
X_cluster_scaled = scaler.fit_transform(X_cluster)

# Define clustering models
clustering_models = {
    'KMeans-2': KMeans(n_clusters=2, random_state=42),
    'KMeans-3': KMeans(n_clusters=3, random_state=42),
    'KMeans-4': KMeans(n_clusters=4, random_state=42),
    'Agglomerative-3': AgglomerativeClustering(n_clusters=3)
}

cluster_results = {}

for name, model in clustering_models.items():
    labels = model.fit_predict(X_cluster_scaled)
    df[f'cluster_{name}'] = labels
    if len(set(labels)) > 1:
        silhouette = silhouette_score(X_cluster_scaled, labels)
        davies_bouldin = davies_bouldin_score(X_cluster_scaled, labels)
        cluster_results[name] = {
            'Silhouette': silhouette,
            'Davies-Bouldin': davies_bouldin,
            'Labels': labels
        }

print("\n=== CLUSTERING TASK: Grouping Similar Courses ===")
print("\nClustering Model Performance:")
for name, metrics in cluster_results.items():
    print(f"{name}: Silhouette = {metrics['Silhouette']:.4f}, Davies-Bouldin = {metrics['Davies-Bouldin']:.4f}")

best_cluster_model = max(cluster_results.items(), key=lambda x: x[1]['Silhouette'])
print(f"\nBest Clustering Model: {best_cluster_model[0]}")
print(f"Silhouette Score: {best_cluster_model[1]['Silhouette']:.4f}")

# Visualize clusters using PCA
pca = PCA(n_components=2)
X_cluster_pca = pca.fit_transform(X_cluster_scaled)

best_labels = best_cluster_model[1]['Labels']
plt.figure(figsize=(10, 6))
for i in range(max(best_labels) + 1):
    plt.scatter(
        X_cluster_pca[best_labels == i, 0],
        X_cluster_pca[best_labels == i, 1],
        label=f'Cluster {i}',
        alpha=0.7
    )

plt.title(f'PCA Visualization of Clusters ({best_cluster_model[0]})')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.tight_layout()
plt.savefig('cluster_visualization.png')
plt.show()

# Analyze cluster characteristics
print("\nCluster Characteristics:")
cluster_column = f'cluster_{best_cluster_model[0]}'
cluster_profiles = df.groupby(cluster_column)[cluster_features].mean()
print(cluster_profiles)

# Map clusters to meaningful names
cluster_names = {}
for cluster_id, profile in cluster_profiles.iterrows():
    if profile['price'] > 150:
        cluster_names[cluster_id] = "Premium Courses"
    elif profile['rating'] > 0.7:
        cluster_names[cluster_id] = "High-Rated Courses"
    elif profile['is_tech'] > 0.5:
        cluster_names[cluster_id] = "Technical Courses"
    else:
        cluster_names[cluster_id] = f"General Courses (Cluster {cluster_id})"

print("\nCluster Interpretations:")
for cluster_id, name in cluster_names.items():
    print(f"Cluster {cluster_id}: {name}")

## 6. COMPREHENSIVE ANALYSIS AND INSIGHTS

Analyze various correlations, institution performance, skills demand, subject areas, and compare tech vs non-tech courses.

In [None]:
# 1. Price-Rating Relationship
corr_price_rating = df['price'].corr(df['rating'])
print(f"\n1. Price-Rating Correlation: {corr_price_rating:.4f}")
if corr_price_rating > 0.5:
    print("   Higher-priced courses tend to have better ratings")
elif corr_price_rating < -0.5:
    print("   Lower-priced courses tend to have better ratings")
else:
    print("   No strong correlation between price and rating")

# 2. Duration-Rating Relationship
corr_duration_rating = df['duration_days'].corr(df['rating'])
print(f"\n2. Duration-Rating Correlation: {corr_duration_rating:.4f}")
if corr_duration_rating > 0.5:
    print("   Longer courses tend to have better ratings")
elif corr_duration_rating < -0.5:
    print("   Shorter courses tend to have better ratings")
else:
    print("   No strong correlation between course duration and rating")

# 3. Institution Analysis
print("\n3. Institution Performance:")
inst_ratings = df.groupby('Institution')['rating'].mean().sort_values(ascending=False)
print(inst_ratings)

# 4. Skill Demand Analysis
top_skills = []
for skill in unique_skills:
    skill_col = f'skill_{skill.replace(" ", "_")}'
    if skill_col in df.columns:
        top_skills.append((skill, df[skill_col].sum()))

# Sorting the skills in descending order and taking top 5
top_skills = sorted(top_skills, key=lambda x: x[1], reverse=True)[:5]
print("\n4. Most Common Skills Across Courses:")
for skill, count in top_skills:
    print(f"   {skill}: {count} courses")

# 5. Subject Analysis
subject_stats = df.groupby('subject').agg({
    'rating': 'mean',
    'price': 'mean',
    'duration_days': 'mean',
    'course_name': 'count'
}).sort_values('course_name', ascending=False)

subject_stats.columns = ['Average Rating', 'Average Price', 'Average Duration', 'Count']
print("\n5. Subject Area Analysis:")
print(subject_stats)

# 6. Price-Duration Efficiency
price_duration_corr = df['price'].corr(df['duration_days'])
print(f"\n6. Price-Duration Correlation: {price_duration_corr:.4f}")
if price_duration_corr > 0.7:
    print("   Course prices strongly relate to their duration")
else:
    print("   Course prices do not strongly relate to their duration")

# 7. Tech vs Non-Tech Courses
tech_stats = df.groupby('is_tech').agg({
    'rating': 'mean',
    'price': 'mean',
    'duration_days': 'mean',
    'course_name': 'count'
})
tech_stats.index = ['Non-Tech', 'Tech']
tech_stats.columns = ['Average Rating', 'Average Price', 'Average Duration', 'Count']
print("\n7. Tech vs Non-Tech Course Comparison:")
print(tech_stats)

# 8. Level Difficulty Analysis
level_stats = df.groupby('Level').agg({
    'rating': 'mean',
    'price': 'mean',
    'duration_days': 'mean',
    'course_name': 'count'
})
level_stats.columns = ['Average Rating', 'Average Price', 'Average Duration', 'Count']
print("\n8. Course Level Analysis:")
print(level_stats)

# 9. Price Tier Analysis
tier_stats = df.groupby('price_tier').agg({
    'rating': 'mean',
    'duration_days': 'mean',
    'course_name': 'count'
})
tier_stats.columns = ['Average Rating', 'Average Duration', 'Count']
print("\n9. Price Tier Analysis:")
print(tier_stats)

# 10. Duration vs Skill Count
skill_duration_corr = df['skill_count'].corr(df['duration_days'])
print(f"\n10. Skill Count-Duration Correlation: {skill_duration_corr:.4f}")

## 7. CONCLUSIONS AND RECOMMENDATIONS

Summarize key findings and provide recommendations for EdX as well as course authors.

In [None]:
print("\n=== CONCLUSIONS AND RECOMMENDATIONS ===")

print("\nKey Findings:")
print("1. Course pricing strategies can be optimized based on our price prediction model.")
print("2. Course ratings are most strongly influenced by institution reputation and course format.")
print("3. Courses naturally cluster into distinct groups based on their characteristics.")
print("4. Technical courses have different pricing and duration patterns than non-technical courses.")

print("\nRecommendations for EdX:")
print("1. Consider adjusting prices for courses in specific subject areas to optimize revenue.")
print("2. Focus on the key factors identified for improving course ratings.")
print("3. Target new course development in the high-performing subject areas.")
print("4. Use the identified clusters to create more effective marketing segments.")

print("\nRecommendations for Course Authors:")
print("1. Include the most in-demand skills in course content.")
print("2. Optimize course duration based on subject matter and target audience.")
print("3. Consider the level-appropriate pricing strategies identified in our analysis.")

print("\nLimitations of This Analysis:")
print("1. Small dataset size limits the reliability of our models.")
print("2. Missing additional features like student demographics, completion rates, etc.")
print("3. Limited time range may not capture seasonal trends.")

print("\nFuture Work:")
print("1. Expand the dataset to include more courses and additional features.")
print("2. Implement time-based analysis to track course performance over time.")
print("3. Conduct A/B testing on price adjustments based on our recommendations.")
print("4. Develop a more sophisticated recommendation system for students based on our findings.")