# Documentation for pymongo Installation and Usage in Cosine similarity recommendation 

## Introduction

`pymongo` is a Python library that provides tools and functionalities for working with MongoDB, a popular NoSQL database. This documentation will guide you through the process of installing `pymongo` and using it to interact with MongoDB databases.

### Installation


To install `pymongo`, you can use the Python package manager 
`pip`. Follow the steps below:

1. Open a command prompt or terminal on your computer.
2. Run the following command:





In [27]:
!pip install pymongo



3. Wait for the installation to complete. Once installed, you can now import and use `pymongo` in your Python projects.

## Importing Required Libraries

Before using `pymongo`, you need to import some additional libraries: `pandas` and `numpy`. These libraries provide functionalities for working with data frames and arrays, which can be useful when manipulating MongoDB data.

In [28]:
import pymongo 
import pandas as pd 
import numpy as np

### Connecting to MongoDB

To connect to a MongoDB database, you need to create an instance of the `MongoClient` class and pass the connection string as a parameter. The connection string contains information about the MongoDB server, such as the hostname, port, and authentication credentials.

In [29]:
client=pymongo.MongoClient("mongodb+srv://dipak:isJNvrJEVWg5im3q@cluster0.x4jjg.mongodb.net/test")

## Accessing a Database and Collection

Once connected to the MongoDB server, you can access a specific database and collection using the [] operator on the client object.

In [30]:
db=client["sportShop"]
product=db.products

In the example above, we accessed the "sportShop" database and the "products" collection within that database.

## Retrieving Documents from a Collection

To retrieve documents from a collection, you can use the `find` method on the collection object. The `find` method returns a cursor, which can be iterated to access the documents.

In [31]:
all_products=product.find({})
list_products=list(all_products)


The code above retrieves all documents from the "products" collection and converts them into a list.

# Creating a DataFrame from MongoDB Documents 

To work with the retrieved MongoDB documents as a tabular data structure, you can create a DataFrame using the `pandas` library.

In [32]:
df=pd.DataFrame(list_products)
df.head()

Unnamed: 0,_id,rating,numReviews,price,countInStock,name,image,description,brand,category,user,reviews,__v,createdAt,updatedAt
0,63e7989616745843a0c18b7c,3.75,8,89.99,3,Airpods Wireless Bluetooth Headphones,/images/airpods.jpg,Bluetooth technology lets you connect it with ...,Apple,Electronics,63e7989616745843a0c18b65,"[{'_id': 63e799d03229b9401471c95f, 'name': 'sa...",8,2023-02-11 13:31:02.924,2023-05-04 11:37:57.943
1,63e7989616745843a0c18b88,3.6,5,399.99,20,Canon EOS Rebel T7 DSLR Camera,/images/6323758_bd.jpg,Capture stunning photographs with the Canon EO...,Canon,Electronics,63e7989616745843a0c18b65,"[{'_id': 63eb90ec8e2fe542a489f9d9, 'name': 're...",5,2023-02-11 13:31:02.926,2023-05-08 21:44:52.134
2,63e7989616745843a0c18b7f,3.5,6,399.99,10,Sony Playstation 4 Pro White Version,/images/playstation.jpg,The ultimate home entertainment center starts ...,Sony,Electronics,63e7989616745843a0c18b65,"[{'_id': 63e79a2b3229b9401471c961, 'name': 'sa...",6,2023-02-11 13:31:02.925,2023-05-08 22:35:35.673
3,63e7989616745843a0c18b8a,1.857143,7,199.99,30,Beats by Dre Powerbeats Pro Wireless Earphones,/images/downloadbead.jpg,Get in the zone with the Beats by Dre Powerbea...,Beats by Dre,Electronics,63e7989616745843a0c18b65,"[{'_id': 63eb90dc8e2fe542a489f9d8, 'name': 're...",7,2023-02-11 13:31:02.927,2023-04-27 12:10:38.950
4,63e7989616745843a0c18b87,2.8,5,799.99,10,Apple iPad Pro,/images/downloadapple.jpg,Get work done and stay connected on the go wit...,Apple,Electronics,63e7989616745843a0c18b65,"[{'_id': 63e79aaa3229b9401471c964, 'name': 'ra...",5,2023-02-11 13:31:02.926,2023-04-27 12:02:50.360


The code above creates a DataFrame `df` from the list of MongoDB documents and displays the first few rows using the head method.

## Data Manipulation and Analysis

In [33]:
df_list = []
for item in list_products:
    for sub_item in item['reviews']:
        df_list.append({'rating': sub_item['rating'],"product_id":item["_id"],'user_id': sub_item['user'],'product':item['name'],'image':item['image'],'price':item['price'] })
df = pd.DataFrame(df_list)
print(df)

     rating                product_id                   user_id  \
0         5  63e7989616745843a0c18b7c  63e7989616745843a0c18b7b   
1         5  63e7989616745843a0c18b7c  63e7989616745843a0c18b68   
2         4  63e7989616745843a0c18b7c  63e7989616745843a0c18b77   
3         4  63e7989616745843a0c18b7c  63e7989616745843a0c18b75   
4         5  63e7989616745843a0c18b7c  63e7989616745843a0c18b66   
..      ...                       ...                       ...   
209       4  6459659e2bf0540030b42b0c  64598b04138df50030b79610   
210       3  6459659e2bf0540030b42b0c  644a5f6a4d690900314fa7a7   
211       4  645965e82bf0540030b42b18  64596d2ae640040030448085   
212       2  645965e82bf0540030b42b18  645978e5e640040030448235   
213       5  645965e82bf0540030b42b18  64598b04138df50030b79610   

                                               product  \
0                Airpods Wireless Bluetooth Headphones   
1                Airpods Wireless Bluetooth Headphones   
2                Airp

## Data Manipulation and Analysis

Once you have the DataFrame, you can perform various data manipulation and analysis tasks using pandas and numpy libraries. The example below demonstrates how to create a user rating pivot table and fill missing values.

In [34]:
userRating=df.pivot_table(index=["product"],columns=['user_id'],values='rating')
userRating.head()

user_id,63e7989616745843a0c18b65,63e7989616745843a0c18b66,63e7989616745843a0c18b68,63e7989616745843a0c18b69,63e7989616745843a0c18b6a,63e7989616745843a0c18b73,63e7989616745843a0c18b75,63e7989616745843a0c18b76,63e7989616745843a0c18b77,63e7989616745843a0c18b78,...,644a5db34d690900314fa784,644a5f6a4d690900314fa7a7,644a61534d690900314fa7dd,644a63374d690900314fa808,644a65014d690900314fa839,6456380e9c6e6400328bd2e4,64596d2ae640040030448085,64597748e640040030448168,645978e5e640040030448235,64598b04138df50030b79610
product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"AXE MEMORY Superb 512GB USB 3.1 SuperSpeed Flash Drive, Metal Casing, Optimal Read Speeds Up to 400 MB/s. Write Speeds Up to 300 MB/s",,,,,,,,,,,...,,3.0,,,,,4.0,,3.0,5.0
Airpods Wireless Bluetooth Headphones,,5.0,5.0,,,,4.0,,4.0,,...,,,,,,,,,,
"Alfa Long-Range Dual-Band AC1200 Wireless USB 3.0 Type-C Wi-Fi Adapter w/2x 5dBi External Antennas – 2.4GHz 300Mbps/5GHz 867Mbps – 802.11ac & A, B, G, N",,,,,,,,,,,...,,,,,,,,5.0,2.0,1.0
Amazon Echo Dot 3rd Generation,,,,,,,,,,5.0,...,3.0,,,,1.0,,,,3.0,
"Apple 20W USB-C Power Adapter - iPhone Charger with Fast Charging Capability, Type C Wall Charger",,,,,,,,,,,...,,,,,,,,4.0,4.0,1.0


In [35]:
corrMatrx=userRating.corr(method='pearson')
corrMatrx

user_id,63e7989616745843a0c18b65,63e7989616745843a0c18b66,63e7989616745843a0c18b68,63e7989616745843a0c18b69,63e7989616745843a0c18b6a,63e7989616745843a0c18b73,63e7989616745843a0c18b75,63e7989616745843a0c18b76,63e7989616745843a0c18b77,63e7989616745843a0c18b78,...,644a5db34d690900314fa784,644a5f6a4d690900314fa7a7,644a61534d690900314fa7dd,644a63374d690900314fa808,644a65014d690900314fa839,6456380e9c6e6400328bd2e4,64596d2ae640040030448085,64597748e640040030448168,645978e5e640040030448235,64598b04138df50030b79610
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
63e7989616745843a0c18b65,,,,,,,,,,,...,,,,,,,,,,
63e7989616745843a0c18b66,,,,,,,,,,,...,,,,,,,,,,
63e7989616745843a0c18b68,,,1.0,,,,-1.0,,-1.0,,...,,,,,,,,,,
63e7989616745843a0c18b69,,,,1.0,0.5,,,,,,...,-1.0,1.0,,,,,,,,
63e7989616745843a0c18b6a,,,,0.5,1.0,,,,,,...,-0.5,0.5,,,,,-1.0,,,
63e7989616745843a0c18b73,,,,,,,,,,,...,,,,,,,,,,
63e7989616745843a0c18b75,,,-1.0,,,,1.0,,0.995871,,...,,,1.0,,,,,,,
63e7989616745843a0c18b76,,,,,,,,,,,...,,,,,,,,,,
63e7989616745843a0c18b77,,,-1.0,,,,0.995871,,1.0,,...,1.0,0.766965,0.96833,0.052414,,,,,,
63e7989616745843a0c18b78,,,,,,,,,,1.0,...,1.0,,,,,,,,,


In [36]:
userRating.fillna(0,inplace=True)

In [37]:
userRating

user_id,63e7989616745843a0c18b65,63e7989616745843a0c18b66,63e7989616745843a0c18b68,63e7989616745843a0c18b69,63e7989616745843a0c18b6a,63e7989616745843a0c18b73,63e7989616745843a0c18b75,63e7989616745843a0c18b76,63e7989616745843a0c18b77,63e7989616745843a0c18b78,...,644a5db34d690900314fa784,644a5f6a4d690900314fa7a7,644a61534d690900314fa7dd,644a63374d690900314fa808,644a65014d690900314fa839,6456380e9c6e6400328bd2e4,64596d2ae640040030448085,64597748e640040030448168,645978e5e640040030448235,64598b04138df50030b79610
product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"AXE MEMORY Superb 512GB USB 3.1 SuperSpeed Flash Drive, Metal Casing, Optimal Read Speeds Up to 400 MB/s. Write Speeds Up to 300 MB/s",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,3.0,0.0,0.0,0.0,0.0,4.0,0.0,3.0,5.0
Airpods Wireless Bluetooth Headphones,0.0,5.0,5.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Alfa Long-Range Dual-Band AC1200 Wireless USB 3.0 Type-C Wi-Fi Adapter w/2x 5dBi External Antennas – 2.4GHz 300Mbps/5GHz 867Mbps – 802.11ac & A, B, G, N",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,1.0
Amazon Echo Dot 3rd Generation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,...,3.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,0.0
"Apple 20W USB-C Power Adapter - iPhone Charger with Fast Charging Capability, Type C Wall Charger",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,1.0
"Apple MagSafe Charger - Wireless Charger with Fast Charging Capability, Type C Wall Charger, Compatible with iPhone and AirPods",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,2.0,0.0,0.0,0.0,0.0,4.0,0.0,2.0,3.0
Apple iPad Pro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,...,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
Beats by Dre Powerbeats Pro Wireless Earphones,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,1.0,1.0,2.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0
Boytone - 2500W 2.1-Ch. Home Theater System - Black Diamond,0.0,0.0,0.0,5.0,4.0,0.0,0.0,0.0,0.0,0.0,...,2.0,4.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0
Cannon EOS 80D DSLR Camera,0.0,0.0,5.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,4.0,0.0


## Calculating Similarity Scores

To calculate similarity scores between products based on user ratings, you can use the `cosine_similarity` function from the `sklearn.metrics.pairwise` module.

In [38]:
from sklearn.metrics.pairwise import cosine_similarity

In [39]:
similarity_score=cosine_similarity(userRating)

In [40]:
similarity_score

array([[1.        , 0.        , 0.26146048, ..., 0.75164603, 0.16570343,
        0.11877696],
       [0.        , 1.        , 0.        , ..., 0.        , 0.07276069,
        0.55414883],
       [0.26146048, 0.        , 1.        , ..., 0.21081851, 0.        ,
        0.11104687],
       ...,
       [0.75164603, 0.        , 0.21081851, ..., 1.        , 0.24494897,
        0.17558051],
       [0.16570343, 0.07276069, 0.        , ..., 0.24494897, 1.        ,
        0.        ],
       [0.11877696, 0.55414883, 0.11104687, ..., 0.17558051, 0.        ,
        1.        ]])

## Generating Recommendations

With the similarity scores calculated, you can now generate product `recommendations` based on a given product. The recommended function takes a product name as input and returns a list of similar products based on the highest similarity scores.

In [41]:
def recommended(book_name):
    #index fetch 
    index=np.where(userRating.index==book_name)[0][0]
    similar_items=sorted(list(enumerate(similarity_score[index])),key=lambda x:x[1],reverse=True)[1:4]
    data = []
    for i in similar_items:
        item = []
        temp_df = df[df['product'] == userRating.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('product')['product'].values))
        item.extend(list(temp_df.drop_duplicates('product')['rating'].values))
        item.extend(list(temp_df.drop_duplicates('product')['product_id'].values))
        item.extend(list(temp_df.drop_duplicates('product')['image'].values))
        item.extend(list(temp_df.drop_duplicates('product')['price'].values))
        data.append(item)
    return data

In [42]:
recommended("iPhone 11 Pro 256GB Memory")

[['Airpods Wireless Bluetooth Headphones',
  5,
  ObjectId('63e7989616745843a0c18b7c'),
  '/images/airpods.jpg',
  89.99],
 ['Sony PlayStation 5 Console',
  1,
  ObjectId('63e7989616745843a0c18b8d'),
  '/images/istockphoto-1287493837-612x612.jpg',
  499.99],
 ['Sony Playstation 4 Pro White Version',
  4,
  ObjectId('63e7989616745843a0c18b7f'),
  '/images/playstation.jpg',
  399.99]]

In [43]:
userRating.index[5]

'Apple MagSafe Charger - Wireless Charger with Fast Charging Capability, Type C Wall Charger, Compatible with iPhone and AirPods'

## Saving Data for web Recommendation

If you intend to use the generated user ratings, DataFrame, and similarity scores for web recommendation purposes, you can save them as pickle files.

In [44]:
import pickle
pickle.dump(userRating,open('WebReco/userRating.pkl','wb'))
pickle.dump(df,open('WebReco/df.pkl','wb'))
pickle.dump(similarity_score,open('WebReco/similarity_score.pkl','wb'))