# -----------------------------------
##### User-Based Collaborative Filtering Recommendation System
# -----------------------------------
-  This code demonstrates a simple implementation of a User-Based Collaborative Filtering
- recommendation system. The system suggests movies to users based on the preferences
- of other users with similar taste. The system operates on two datasets:
   -  1. 'ratings.csv' - Contains movie ratings by different users.
   -   2. 'movies.csv' - Contains movie information, such as movie titles.

### Key Steps:
#### 1- load and Call used libraries
- 1. pandas: A powerful library for data manipulation and analysis, Used for working with structured data like data frames.
- 2. numpy A fundamental library for numerical computations in Python, It provides support for arrays and matrices, along with mathematical functions
- 3. Scipy tools are widely used for performing statistical analysis, for probability distributions, hypothesis testing, and descriptive statistics.
- 4. seaborn and matplot for visulization ,built on top of matplotlib
- 5. cosine_similarity: A function from scikit-learn that computes the cosine similarity between two vectors or matrices. Commonly used in recommendation systems to find similarities between users or items based on their attributes.
- 6.  csr_matrix:is used for efficient storage and computation, especially for large and sparse matrices where most elements are zero. It reduces memory usage and speeds up operations like cosine similarity.

In [1]:
# Data processing
import pandas as pd 
import numpy as np 
import scipy.stats
# Visualization
import seaborn as sns
import matplotlib.pyplot as plt
#similarity
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix

#### 2- Load and merge the datasets to create a unified data source linking user ratings with movie titles.

In [2]:
reating = pd.read_csv("C:\\Users\\zahra\\Downloads\\Telegram Desktop\\ratings.csv")
reating

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


In [3]:
reating.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     100836 non-null  int64  
 1   movieId    100836 non-null  int64  
 2   rating     100836 non-null  float64
 3   timestamp  100836 non-null  int64  
dtypes: float64(1), int64(3)
memory usage: 3.1 MB


In [4]:
movies = pd.read_csv("C:\\Users\\zahra\\Downloads\\Telegram Desktop\\movies.csv")
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [5]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9742 entries, 0 to 9741
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   movieId  9742 non-null   int64 
 1   title    9742 non-null   object
 2   genres   9742 non-null   object
dtypes: int64(1), object(2)
memory usage: 228.5+ KB


In [6]:
unique_movies_ids = reating['movieId'].nunique()
# Print the result
print(f"Number of unique movies IDs: {unique_movies_ids}")

Number of unique movies IDs: 9724


In [7]:
unique_user_ids = reating['userId'].nunique()
# Print the result
print(f"Number of unique user IDs: {unique_user_ids}")

Number of unique user IDs: 610


In [8]:
df = pd.merge (reating, movies, on = 'movieId', how='inner')

#take a look at the data
df

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
...,...,...,...,...,...,...
100831,610,166534,4.0,1493848402,Split (2017),Drama|Horror|Thriller
100832,610,168248,5.0,1493850091,John Wick: Chapter Two (2017),Action|Crime|Thriller
100833,610,168250,5.0,1494273047,Get Out (2017),Horror
100834,610,168252,5.0,1493846352,Logan (2017),Action|Sci-Fi
