<ul>
  <li><strong>Encoding:</strong> Why use a specific encoding for the Cyrillic script? I recommend using UTF-8 for broader compatibility.</li>
  <li><strong>Deprecated Method:</strong> The <code>error_bad_lines</code> method is deprecated in latest versions of Pandas. Instead, use <code>on_bad_lines='skip'</code>.</li>
</ul>

In [None]:
# import
import pandas as pd
import numpy as np

# load ratings
ratings = pd.read_csv('Downloads/BX-Book-Ratings.csv', encoding='cp1251', sep=';')
ratings = ratings[ratings['Book-Rating']!=0]

# load books
books = pd.read_csv('Downloads/BX-Books.csv',  encoding='cp1251', sep=';',error_bad_lines=False)

<ul>
    <li><strong>Data Integrity:</strong> The dataset contains corrupted data. Consider applying a mapping to restore the correct string values.</li>
    <li><strong>Data Filtering and Aggregation:</strong> There are many books with the same title. It's better to use a combination of <code>Book-Title</code> and <code>Book-Author</code> for more accurate filtering and aggregation.</li>
</ul>

In [None]:

#users_ratigs = pd.merge(ratings, users, on=['User-ID'])
dataset = pd.merge(ratings, books, on=['ISBN'])
dataset_lowercase=dataset.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)

tolkien_readers = dataset_lowercase['User-ID'][(dataset_lowercase['Book-Title']=='the fellowship of the ring (the lord of the rings, part 1)') & (dataset_lowercase['Book-Author'].str.contains("tolkien"))]
tolkien_readers = tolkien_readers.tolist()
tolkien_readers = np.unique(tolkien_readers)

# final dataset
books_of_tolkien_readers = dataset_lowercase[(dataset_lowercase['User-ID'].isin(tolkien_readers))]

# Number of ratings per other books in dataset
number_of_rating_per_book = books_of_tolkien_readers.groupby(['Book-Title']).agg('count').reset_index()

#select only books which have actually higher number of ratings than threshold
books_to_compare = number_of_rating_per_book['Book-Title'][number_of_rating_per_book['User-ID'] >= 8]
books_to_compare = books_to_compare.tolist()

ratings_data_raw = books_of_tolkien_readers[['User-ID', 'Book-Rating', 'Book-Title']][books_of_tolkien_readers['Book-Title'].isin(books_to_compare)]

# group by User and Book and compute mean
ratings_data_raw_nodup = ratings_data_raw.groupby(['User-ID', 'Book-Title'])['Book-Rating'].mean()

# reset index to see User-ID in every row
ratings_data_raw_nodup = ratings_data_raw_nodup.to_frame().reset_index()

dataset_for_corr = ratings_data_raw_nodup.pivot(index='User-ID', columns='Book-Title', values='Book-Rating')

LoR_list = ['the fellowship of the ring (the lord of the rings, part 1)']

result_list = []
worst_list = []


The following code calculates the mean of 'Book-Rating' and 'User-ID' for a specific book and appends the minimum of these means to the `avgrating` list. Doesn't seem logical.

```python
tab = (ratings_data_raw[ratings_data_raw['Book-Title'] == book_title]
       .groupby(ratings_data_raw['Book-Title'])
       .mean())
avgrating.append(tab['Book-Rating'].min())
```
<b>Fix suggetion:</b>
```python
avg_rating = ratings_data_raw[
                    (ratings_data_raw['Book-Title'] == book[0]) & 
                    (ratings_data_raw['Book-Author'] == book[1])
                ].groupby(['Book-Title', 'Book-Author'])['Book-Rating'].mean().values[0]
avgrating.append(round(avg_rating, 2))
```

In [None]:

# for each of the trilogy book compute:
for LoR_book in LoR_list:
    
    #Take out the Lord of the Rings selected book from correlation dataframe
    dataset_of_other_books = dataset_for_corr.copy(deep=False)
    dataset_of_other_books.drop([LoR_book], axis=1, inplace=True)
      
    # empty lists
    book_titles = []
    correlations = []
    avgrating = []

    # corr computation
    for book_title in list(dataset_of_other_books.columns.values):
        book_titles.append(book_title)
        correlations.append(dataset_for_corr[LoR_book].corr(dataset_of_other_books[book_title]))
        # why calculate mean for user_id?
        tab=(ratings_data_raw[ratings_data_raw['Book-Title']==book_title].groupby(ratings_data_raw['Book-Title']).mean())
        # why calculate min?
        avgrating.append(tab['Book-Rating'].min())
    # final dataframe of all correlation of each book   
    corr_fellowship = pd.DataFrame(list(zip(book_titles, correlations, avgrating)), columns=['book','corr','avg_rating'])
    corr_fellowship.head()

    # top 10 books with highest corr
    result_list.append(corr_fellowship.sort_values('corr', ascending = False).head(10))
    
    #worst 10 books
    worst_list.append(corr_fellowship.sort_values('corr', ascending = False).tail(10))
    
print("Correlation for book:", LoR_list[0])
#print("Average rating of LOR:", ratings_data_raw[ratings_data_raw['Book-Title']=='the fellowship of the ring (the lord of the rings, part 1'].groupby(ratings_data_raw['Book-Title']).mean()))
rslt = result_list[0]