# Random Data Generated

This code will generate two CSV data. One CSV data is named as "user_rating.csv". It contains 500 unique users with their ratings on the items they bought. The rating score ranges from 1 to 10. Another is called "list_items.csv". It contains 500 unique items, its postcode and price. Our goal of the prediction is to predict the ratings that each user will give to the items they have not bought. We generate random 10 postcodes which will be randomly assigned to each items. We are not generating random postcodes for each items as it will take too much of time.

# List of Items

Each item has a unique ID, from 1 to n, where n = 500. 

In [2]:
import pandas as pd
import requests
import json
import random

N = 500

items = []

postcode_item = []
postcode_name = []
postcode_price = []

random_postcode = []
for i in range(0,10):
    r = requests.get('https://api.postcodes.io/random/postcodes')
    postcode = json.loads(r.content)["result"]["postcode"]
    random_postcode.append(postcode)
    
for j in range(0,N):
    postcode_item.append(j+1)
    name = random.choice(random_postcode)
    postcode_name.append(name)
    price = int(random.uniform(5,100))
    postcode_price.append(price)
    items.append({"item_id" : j+1, "item_postcode" : name})

# User Rating

Each user has a unique ID, from 1 to n, where n = 500. When we generate the data randomly, we have a few assumptions:

1. Each user has a probability of 0.5 of buying an item.
2. If the user does not buy the item, the user does not give rating to the item.
3. If the user bought the item, the user has a probability of 0.5 of rating it.
4. If the user does not rate it, we rate the item based on how many times the user buys.
5. If the user rated the item, the rating of the item will be what the user gives.
6. For each item, the postcode is in UK and randomly generated.

In [3]:
import random

users_id = list(range(1, N+1))
user_list = []
item_list = []
postcode_list = []
rating_list = []
prob_buying = 0.5
prob_rating = 0.5
default_rating = 0.5

for user in users_id:
    for item in items:
        buying = random.uniform(0,1)
        if buying < prob_buying:
            #user buys the item
            user_list.append(user)
            item_list.append(item['item_id'])
            postcode_list.append(item['item_postcode'])
            rating = random.uniform(0,1)
            if rating < prob_rating:
                #user rates the item
                rating_list.append(int(random.uniform(0,10)))
            else:
                #user does not rate the item, default rating is 0.5
                quantity = int(random.uniform(0,1))
                if(quantity >=10):
                    quantity = 10
                rating_list.append(quantity)

list_items = {'ItemID' : postcode_item, 'Postcode' : postcode_name, 'Price' : postcode_price}
user_ratings = {'UserID' : user_list, 'ItemID' : item_list, 'Rating': rating_list, 'Postcode' : postcode_list}
df1 = pd.DataFrame(list_items, columns= ['ItemID', 'Postcode', 'Price'])
df2 = pd.DataFrame(user_ratings, columns= ['UserID', 'ItemID', 'Rating', 'Postcode'])

In [4]:
df1.to_csv('list_items.csv', index = False, header=True)
df2.to_csv('user_rating.csv', index = False, header=True)