# Data Exploration for Recommendation System

This notebook explores the dataset we'll use for our recommendation system.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
plt.style.use('ggplot')
sns.set(style="whitegrid")

## Loading Sample Data

For this initial exploration, we'll create a small sample dataset of articles with their content and metadata.

In [None]:
# Create a sample dataset
articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed.",
        "category": "AI",
        "tags": ["machine learning", "AI", "algorithms"]
    },
    {
        "id": 2,
        "title": "Deep Learning Fundamentals",
        "content": "Deep learning is a subset of machine learning that uses neural networks with many layers.",
        "category": "AI",
        "tags": ["deep learning", "neural networks", "AI"]
    },
    {
        "id": 3,
        "title": "Python for Data Science",
        "content": "Python is a popular programming language for data science due to its simplicity and powerful libraries.",
        "category": "Programming",
        "tags": ["python", "programming", "data science"]
    },
    {
        "id": 4,
        "title": "Natural Language Processing",
        "content": "NLP is a field of AI that focuses on the interaction between computers and human language.",
        "category": "AI",
        "tags": ["NLP", "AI", "text processing"]
    },
    {
        "id": 5,
        "title": "Data Visualization Techniques",
        "content": "Data visualization is the graphical representation of information and data using visual elements.",
        "category": "Data Science",
        "tags": ["visualization", "data science", "charts"]
    }
]

# Convert to DataFrame
df_articles = pd.DataFrame(articles)
df_articles

## Basic Data Analysis

Let's examine the structure and content of our dataset.

In [None]:
# Display basic information about the dataset
print("Dataset shape:", df_articles.shape)
print("\nDataset columns:")
for col in df_articles.columns:
    print(f"- {col}")

print("\nCategory distribution:")
print(df_articles['category'].value_counts())

## Next Steps

In the next notebook, we'll:
1. Process the text data using NLP techniques
2. Create vector representations of the articles
3. Build a simple content-based recommendation system