### Data visiualization

First, 1000 samples were randomly selected from the original data final_df, and then the pd.cut() function was used to divide the book ratings into three categories: low, medium, and high ratings. This can help us better understand how authors perform at different rating levels.This visualization allows us to visually see the average rating levels of different authors and compare them according to rating categories. This is very helpful for understanding how authors behave under different levels of popularity in the book recommendation system.

Data preparation: 
First, the 'Book-Title', 'Book-Author', 'Description', and 'Categories' columns in books_df are concatenated into a text corpus. 
Using Gensim's Word2Vec model, a word vector model word2vec_model_recommender is obtained after training in corpus. 

In [None]:
corpus = (books_df['Book-Title'].astype(str) + ' ' + 
          books_df['Book-Author'].astype(str) + ' ' +
          books_df['Description'].astype(str) + ' ' +
          books_df['Categories'].astype(str)).apply(str.split).tolist()

**Recommended algorithm Word2Vec:**
- The recommend function is the core of the recommendation algorithm. 
- The input parameters are the user ID user_id, the complete book data, and the trained Word2Vec model word2vec_model. 
- First get the user's favorite book information, including title, author, category, and description. 
- This information is spliced into a text and divided into words to obtain an average word vector avg_vector. 
- Then the average word vector of each book is calculated, and the cosine similarity is calculated with the average word vector of the  user, and the similarity score is obtained. 
- According to the similarity score, the top 10 books with the highest similarity and scores higher than 5 are selected as the recommended results. 
- Finally, the recommendation result is returned in DataFrame format. 

In [None]:
from gensim.models import Word2Vec
from sklearn.metrics.pairwise import cosine_similarity
word2vec_model_recommender = Word2Vec(sentences=corpus, vector_size=500, window=5, min_count=5, sg=2)

def recommend(user_id, data, word2vec_model):
    # Get user preferences
    user_preferences = data[data['User-ID'] == user_id]
    if user_preferences.empty:
        return None

    # Get information about your favorite books
    liked_books = user_preferences['Book-Title'].tolist()
    liked_authors = user_preferences['Book-Author'].tolist()
    liked_genres = user_preferences['Categories'].tolist()
    liked_description = user_preferences['Description'].tolist()

    # Merge user preference information
    text = ' '.join(liked_books + liked_authors + liked_genres + liked_description)
    
    # Divide the text into words
    tokens = text.split()
    
    # Get text vector
    vectors = [word2vec_model.wv[token] for token in tokens if token in word2vec_model.wv]
    if len(vectors) == 0:
        return None
    avg_vector = sum(vectors) / len(vectors)

    # Calculate the similarity to each book
    similarities = []
    recommended_titles = set()
    for idx, row in data.iterrows():
        row_text = ' '.join([str(row[col]) for col in data.columns])
        row_tokens = row_text.split()
        row_vectors = [word2vec_model.wv[token] for token in row_tokens if token in word2vec_model.wv]
        if len(row_vectors) > 0:
            row_avg_vector = sum(row_vectors) / len(row_vectors)
            similarity = cosine_similarity([avg_vector], [row_avg_vector])[0][0]
            similarities.append((row, similarity))

    # Rank the similarity and select the top 10 recommendations
    similarities.sort(key=lambda x: x[1], reverse=True)
    recommendations = []
    for book, sim in similarities:
        if book['Book-Title'] not in recommended_titles and book['Book-Rating'] > 5:
            recommendations.append(book.to_dict())
            recommended_titles.add(book['Book-Title'])
        if len(recommendations) >= 10:
            break

    # Convert the recommendation result to a DataFrame
    recommendations_df = pd.DataFrame(recommendations)

    return recommendations_df

Example usage: 
In the final section, using the example with user ID 9714, call the recommend function for a book recommendation. 

In [None]:
# Example usage
user_id = 9714
recommendations = recommend(user_id, final_df, word2vec_model_recommender)
print(recommendations)

Conclusion:
This Word2VeC-based recommendation algorithm uses the text information of books (title, author, description, and category) to capture semantic correlations between books by learning word vectors. When the user likes some books, the system can find similar books to recommend according to the characteristics of these books. This approach can provide more personalized and semantically relevant recommendations, with better performance compared to simple popularity-based or collaborative filtering recommendation algorithms.

**Data Preprocessing**

Splicing the 'Book Title', 'Book Author', 'Description' and 'Categories' columns in' final_df 'into a large text corpus. 
Use 'TfidfVectorizer' to extract TF-IDF features from 'corpus' and generate tfidf_matrix. 

**Recommendation Algorithm**:
- The 'recommend' function accepts the user ID and the entire dataset 'final_df' as input. 
- First, get the user's favorite book list 'user_preferences'. 
- Then, generate the user preference vector user_vector according to user_preferences. 
- Next, traverse the entire data set, calculating the cosine similarity between each book and the user's preference vector, and store the results in the similarities list. 
- Sorts the similarities list in descending order of similarity. 
- From the similarities list after sorting, select books that the first 10 users are not familiar with, score more than 5 points, and do not repeat as the recommendation results. 
- Finally, the recommendation result is returned in DataFrame format. 

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

def recommend(user_id, data):
    # Get user preferences
    user_preferences = final_df.loc[final_df['User-ID'] == user_id, 'Book-Title'].tolist()
    
    # Create a user preference vector
    user_vector = tfidf_vectorizer.transform([' '.join(user_preferences)])
    
    # Calculate how similar each book is to user preferences
    similarities = []
    for idx, row in data.iterrows():
        row_text = ' '.join([str(row[col]) for col in data.columns])
        row_vector = tfidf_vectorizer.transform([row_text])
        similarity = cosine_similarity(user_vector, row_vector)[0][0]
        similarities.append((row, similarity))
    
    # Sort in descending order of similarity
    similarities.sort(key=lambda x: x[1], reverse=True)

    # Get the top 10 recommendations
    recommendations = []
    recommended_titles = set()
    for book, sim in similarities:
        if book['Book-Title'] not in user_preferences and book['Book-Rating'] > 5 and book['Book-Title'] not in recommended_titles:
            recommendations.append(book.to_dict())
            recommended_titles.add(book['Book-Title'])
        if len(recommendations) >= 10:
            break
    
    # Convert to a DataFrame
    recommendations_df = pd.DataFrame(recommendations)

    return recommendations_df

# 输入用户编号
user_id = 9714
recommendations = recommend(user_id, final_df)
recommendations