# Phase 6: Conclusion & Business Recommendations



**Objective**: Summarize the findings from our Data Mining project.

**Recap**:
1.  **Data Cleaning**: We successfully cleaned ~3.3M records from Google Play and Apple Store.
2.  **EDA**: We found that most apps are free, and ratings are generally high but skewed.
3.  **Clustering**: We identified distinct segments of apps (e.g., "Small & Popular", "Large Games").
4.  **Prediction**: We built models to predict Installs and High Ratings with reasonable accuracy.
5.  **Association Rules**: We discovered rules linking Price and Category to Ratings.
    


## 1. Final Dataset Overview


In [1]:
import pandas as pd
import os

def get_data_path(filename):
    possible_paths = [
        f"../output/{filename}",
        f"output/{filename}",
        f"/Users/jatinbisen/Desktop/Data_mining/output/{filename}"
    ]
    for path in possible_paths:
        if os.path.exists(path):
            return path
    return None

# Load the final combined dataset to show the scale of analysis
combined_df = pd.read_csv(get_data_path('combined_cleaned.csv'))
print(f"Final Combined Dataset Shape: {combined_df.shape}")
print(f"Total Apps Analyzed: {len(combined_df)}")


Final Combined Dataset Shape: (3379885, 9)
Total Apps Analyzed: 3379885


## 2. Key Findings



### Android vs iOS
- **Price**: iOS apps are generally more expensive than Android apps.
- **Ratings**: Android ratings are heavily skewed towards 4.0-5.0, while iOS shows more variance.
- **Size**: Game apps are significantly larger on both platforms.

### Success Factors
- **Free vs Paid**: Free apps get exponentially more installs.
- **Size**: Smaller apps (< 20MB) tend to have higher install rates in emerging markets (inferred from Android data).
- **Category**: 'Game', 'Communication', and 'Social' are the most downloaded categories.
    


## 3. Business Recommendations



If you are a developer or investor:
1.  **Go Freemium**: The barrier to entry for paid apps is high. Release for free with In-App Purchases.
2.  **Optimize Size**: Keep your app under 50MB to maximize downloads, especially for Android.
3.  **Focus on Quality**: High ratings (>4.0) are essential. Our rules showed that low-rated apps rarely succeed.
4.  **Platform Strategy**: 
    - Launch on **Android** for reach (Volume).
    - Launch on **iOS** for revenue (Value).
    


## 4. Limitations



- **Data Age**: The dataset might not reflect the most current trends (2024/2025).
- **Proxy Metrics**: For iOS, we used `Reviews` as a proxy for popularity since `Installs` was missing. This is an imperfect measure.
- **Missing Data**: A significant portion of 'Developer Website' and 'Privacy Policy' data was missing.
    


## 5. Future Work



- **Sentiment Analysis**: Analyze the text of user reviews to understand *why* users rate apps highly.
- **Time Series Analysis**: Track how app popularity changes over time.
- **Deep Learning**: Use Neural Networks for better prediction accuracy.
    


## 6. References



1.  Google Play Store Dataset (Kaggle).
2.  Apple App Store Dataset (Kaggle).
3.  Scikit-Learn Documentation.
4.  Mlxtend Documentation for Apriori Algorithm.
    
