### 1. Write Application of Recommender System.

###### 1. E-Commerce
Is an industry where recommendation systems were first widely used. With millions of customers and data on their online behavior, e-commerce companies are best suited to generate accurate recommendations.
###### 2. Retail
Target scared shoppers back in the 2000s when Target systems were able to predict pregnancies even before mothers realized their own pregnancies. Shopping data is the most valuable data as it is the most direct data point on a customer’s intent. Retailers with troves of shopping data are at the forefront of companies making accurate recommendations.

###### 3. Media
Similar to e-commerce, media businesses are one of the first to jump into recommendations. It is difficult to see a news site without a recommendation system.

###### 4. Banking
A mass-market product that is consumed digitally by millions. Banking for the masses and SMEs are prime for recommendations. Knowing a customer’s detailed financial situation, along with their past preferences, coupled with data of thousands of similar users, is quite powerful.

###### 5. Telecom
It Shares similar dynamics with banking. Telcos have access to millions of customers whose every interaction is recorded. Their product range is also rather limited compared to other industries, making recommendations in telecom an easier problem.

###### 6. Utilities
Similar dynamics with telecom, but utilities have an even narrower range of products, making recommendations rather simple.

### 2. What are Data Collection Method in Recommender System.

Data collection in recommender systems can be broadly classified into two categories:

###### Explicit Feedback:
This is the data that users consciously provide to the system. It includes ratings, reviews, likes, and dislikes. While explicit feedback is valuable as it directly reflects user preferences, it can be challenging to collect as it requires user effort.

###### Implicit Feedback: 
This is the data collected from user actions and behavior. It includes clicks, views, browsing history, and purchase history. Implicit feedback is easier to collect as it doesn't require any extra effort from the user. However, interpreting implicit feedback can be challenging as the absence of an action doesn't necessarily indicate disinterest.

### 3. Build a Basic Recommender system

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [5]:
movie=pd.read_csv("movies.csv")

In [6]:
rating=pd.read_csv("ratings.csv")

In [8]:
movie.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [9]:
rating.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,16,4.0,1217897793
1,1,24,1.5,1217895807
2,1,32,4.0,1217896246
3,1,47,4.0,1217896556
4,1,50,4.0,1217896523


In [12]:
df=pd.merge(movie,rating,on="movieId")

In [13]:
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,2,5.0,859046895
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,1303501039
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8,5.0,858610933
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11,4.0,850815810
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,14,4.0,851766286


In [15]:
df.groupby("title")["rating"].mean().sort_values(ascending=False).head()

title
Saddest Music in the World, The (2003)    5.0
Interstate 60 (2002)                      5.0
Gunfighter, The (1950)                    5.0
Heima (2007)                              5.0
Limelight (1952)                          5.0
Name: rating, dtype: float64

- The movies have now been sorted according to the ascending order of their ratings.

- However, there is a problem. A movie can make it to the top of the above list even if only a single user has given it five stars. Therefore, the above stats can be misleading. Normally, a movie which is really a good one gets a higher rating by a large number of users.

- Lets count number of user give rating to each movie

In [17]:
df.groupby("title")["userId"].count().sort_values(ascending=False).head()

title
Pulp Fiction (1994)                 325
Forrest Gump (1994)                 311
Shawshank Redemption, The (1994)    308
Jurassic Park (1993)                294
Silence of the Lambs, The (1991)    290
Name: userId, dtype: int64

- Now, we can see some great movies at the top. The above list supports our point that good movies normally receive higher ratings. Now we know that both the average rating per movie and the number of ratings per movie are important attributes.

- So, let's create a new dataframe that contains both of these attributes.

- We will create a new dataframe called ratings_mean_count and first add the average rating of each movie to this dataframe as follows-

In [20]:
rating_mean_counts=pd.DataFrame()

In [21]:
rating_mean_counts["avg_rating"]=df.groupby("title")["rating"].mean()

In [23]:
rating_mean_counts["No.userId"]=df.groupby("title")["rating"].count()

In [24]:
rating_mean_counts

Unnamed: 0_level_0,avg_rating,No.userId
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'71 (2014),3.500000,1
'Hellboy': The Seeds of Creation (2004),3.000000,1
'Round Midnight (1986),2.500000,1
'Til There Was You (1997),4.000000,3
"'burbs, The (1989)",3.125000,20
...,...,...
loudQUIETloud: A Film About the Pixies (2006),4.500000,1
xXx (2002),2.958333,24
xXx: State of the Union (2005),2.071429,7
¡Three Amigos! (1986),3.012500,40


We can see movie title, along with the average rating and number of ratings for the movies.