<a href="https://colab.research.google.com/github/Sohini3073/Project1/blob/main/chapter_appendix-tools-for-deep-learning/jupyter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Jupyter Notebooks
:label:`sec_jupyter`


This section describes how to edit and run the code
in each section of this book
using the Jupyter Notebook. Make sure you have
installed Jupyter and downloaded the
code as described in
:ref:`chap_installation`.
If you want to know more about Jupyter see the excellent tutorial in
their [documentation](https://jupyter.readthedocs.io/en/latest/).


## Editing and Running the Code Locally

Suppose that the local path of the book's code is `xx/yy/d2l-en/`. Use the shell to change the directory to this path (`cd xx/yy/d2l-en`) and run the command `jupyter notebook`. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in :numref:`fig_jupyter00`.

![The folders containing the code of this book.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter00.png?raw=1)
:width:`600px`
:label:`fig_jupyter00`


You can access the notebook files by clicking on the folder displayed on the webpage.
They usually have the suffix ".ipynb".
For the sake of brevity, we create a temporary "test.ipynb" file.
The content displayed after you click it is
shown in :numref:`fig_jupyter01`.
This notebook includes a markdown cell and a code cell. The content in the markdown cell includes "This Is a Title" and "This is text.".
The code cell contains two lines of Python code.

![Markdown and code cells in the "text.ipynb" file.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter01.png?raw=1)
:width:`600px`
:label:`fig_jupyter01`


Double click on the markdown cell to enter edit mode.
Add a new text string "Hello world." at the end of the cell, as shown in :numref:`fig_jupyter02`.

![Edit the markdown cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter02.png?raw=1)
:width:`600px`
:label:`fig_jupyter02`


As demonstrated in :numref:`fig_jupyter03`,
click "Cell" $\rightarrow$ "Run Cells" in the menu bar to run the edited cell.

![Run the cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter03.png?raw=1)
:width:`600px`
:label:`fig_jupyter03`

After running, the markdown cell is shown in :numref:`fig_jupyter04`.

![The markdown cell after running.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter04.png?raw=1)
:width:`600px`
:label:`fig_jupyter04`


Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in :numref:`fig_jupyter05`.

![Edit the code cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter05.png?raw=1)
:width:`600px`
:label:`fig_jupyter05`


You can also run the cell with a shortcut ("Ctrl + Enter" by default) and obtain the output result from :numref:`fig_jupyter06`.

![Run the code cell to obtain the output.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter06.png?raw=1)
:width:`600px`
:label:`fig_jupyter06`


When a notebook contains more cells, we can click "Kernel" $\rightarrow$ "Restart & Run All" in the menu bar to run all the cells in the entire notebook. By clicking "Help" $\rightarrow$ "Edit Keyboard Shortcuts" in the menu bar, you can edit the shortcuts according to your preferences.

## Advanced Options

Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely.
The latter matters when we want to run the code on a faster server.
The former matters since Jupyter's native ipynb format stores a lot of auxiliary data that is
irrelevant to the content,
mostly related to how and where the code is run.
This is confusing for Git, making
reviewing contributions very difficult.
Fortunately there is an alternative---native editing in the markdown format.

### Markdown Files in Jupyter

If you wish to contribute to the content of this book, you need to modify the
source file (md file, not ipynb file) on GitHub.
Using the notedown plugin we
can modify notebooks in the md format directly in Jupyter.


First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:

```
pip install d2l-notedown  # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
```

You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook.
First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).

```
jupyter notebook --generate-config
```

Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path `~/.jupyter/jupyter_notebook_config.py`):

```
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
```

After that, you only need to run the `jupyter notebook` command to turn on the notedown plugin by default.

### Running Jupyter Notebooks on a Remote Server

Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:

```
ssh myserver -L 8888:localhost:8888
```

The above string `myserver` is the address of the remote server.
Then we can use http://localhost:8888 to access the remote server `myserver` that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances
later in this appendix.

### Timing

We can use the `ExecuteTime` plugin to time the execution of each code cell in Jupyter notebooks.
Use the following commands to install the plugin:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
```

## Summary

* Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
* We can run Jupyter notebooks on remote servers using port forwarding.


## Exercises

1. Edit and run the code in this book with the Jupyter Notebook on your local machine.
1. Edit and run the code in this book with the Jupyter Notebook *remotely* via port forwarding.
1. Compare the running time of the operations $\mathbf{A}^\top \mathbf{B}$ and $\mathbf{A} \mathbf{B}$ for two square matrices in $\mathbb{R}^{1024 \times 1024}$. Which one is faster?


[Discussions](https://discuss.d2l.ai/t/421)


# New section

In [2]:
# %%
"""
Personalized Recommendation Chatbot
Jupyter-friendly Python script with cells. This notebook builds a simple
recommendation chatbot that learns from conversational input and gives
recommendations using both content-based and collaborative-filtering
approaches. It uses synthetic data but includes instructions to plug in
real datasets like MovieLens.

Run each cell in order. The chatbot uses simple input() calls to converse.
"""

# %%
# 1. Imports
import random
import pickle
from collections import defaultdict

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MultiLabelBinarizer, normalize
from sklearn.metrics.pairwise import cosine_similarity

# %%
# 2. Create a small synthetic dataset of items (movies) and users
movies = [
    {"movieId": i+1,
     "title": t,
     "genres": g.split('|'),
     "description": d}
    for i,(t,g,d) in enumerate([
        ("The Lost Journey", "Adventure|Drama", "A small group of friends travel across harsh lands to find an ancient relic."),
        ("Space Echoes", "Sci-Fi|Adventure", "An exploration crew listens to mysterious signals from a distant star."),
        ("Love & Latte", "Romance|Comedy", "Two baristas find love in the city while serving customers with stories."),
        ("Haunted Hollow", "Horror|Thriller", "A family moves into a house with a dark secret and unsettling nights."),
        ("Code of Honor", "Action|Crime", "A detective fights corruption in a dense urban landscape."),
        ("Symphony of Rain", "Drama|Romance", "Intertwined lives of musicians trying to make it through hardship."),
        ("Pixel Wars", "Animation|Action|Family", "Tiny digital heroes save their world from a corrupt algorithm."),
        ("Green Fields", "Documentary|Nature", "A cinematic look at rural life and farming traditions."),
        ("Midnight DJ", "Music|Drama", "The rise of an underground DJ and the soundtrack of a city."),
        ("Quantum Lies", "Sci-Fi|Thriller", "A mind-bending thriller about altered memories and experiments.")
    ])
]

movies_df = pd.DataFrame(movies)
movies_df.set_index('movieId', inplace=True)

# %%
# 3. Synthetic user ratings (small user base)
user_ids = [1,2,3,4,5]
ratings = []
random.seed(42)
for u in user_ids:
    # give each user 3-6 random ratings between 1-5
    rated = random.sample(list(movies_df.index), k=random.randint(3,6))
    for m in rated:
        ratings.append({"userId": u, "movieId": m, "rating": random.randint(1,5)})
ratings_df = pd.DataFrame(ratings)

# %%
# 4. Build item content features (genres + description TF-IDF)
mlb = MultiLabelBinarizer()
genre_mat = mlb.fit_transform(movies_df['genres'])
genre_df = pd.DataFrame(genre_mat, index=movies_df.index, columns=[f"genre_{g}" for g in mlb.classes_])

# TF-IDF on descriptions
tfidf = TfidfVectorizer(max_features=50, stop_words='english')
desc_tfidf = tfidf.fit_transform(movies_df['description'])
desc_df = pd.DataFrame(desc_tfidf.toarray(), index=movies_df.index, columns=[f"tfidf_{i}" for i in range(desc_tfidf.shape[1])])

# Combine content features
item_features = pd.concat([genre_df, desc_df], axis=1)
item_features = item_features.fillna(0)

# normalize
item_features_norm = normalize(item_features, norm='l2')

# %%
# 5. Build collaborative filtering data structures (user-item matrix)
user_item = ratings_df.pivot_table(index='userId', columns='movieId', values='rating').fillna(0)

# compute item-item similarity from ratings (cosine)
item_rating_matrix = user_item.T.values  # items x users
item_item_sim = cosine_similarity(item_rating_matrix)

# %%
# 6. Helper functions for recommenders

def build_user_profile_from_preferences(preferred_genres, liked_titles):
    """Create a user content profile vector from provided genres and liked movie titles."""
    # genre part
    genre_vector = np.zeros(len(mlb.classes_))
    for g in preferred_genres:
        if g in mlb.classes_:
            idx = list(mlb.classes_).index(g)
            genre_vector[idx] = 1
    # liked titles: average their item feature vectors
    liked_ids = [mid for mid,title in zip(movies_df.index, movies_df['title']) if title in liked_titles]
    if liked_ids:
        liked_feats = item_features.loc[liked_ids].values
        liked_mean = liked_feats.mean(axis=0)
    else:
        liked_mean = np.zeros(item_features.shape[1])
    # combine: put more weight on liked items (0.7) and genres (0.3)
    # expand genre_vector to match feature space (it maps to first columns)
    genre_expanded = np.concatenate([genre_vector, np.zeros(item_features.shape[1]-len(genre_vector))])
    user_profile = 0.3 * genre_expanded + 0.7 * liked_mean
    # normalize
    if np.linalg.norm(user_profile) > 0:
        user_profile = user_profile / np.linalg.norm(user_profile)
    return user_profile


def recommend_content_based(user_profile, top_k=5, exclude_seen=None):
    sims = cosine_similarity([user_profile], item_features.values)[0]
    ranked_idx = np.argsort(sims)[::-1]
    recs = []
    for idx in ranked_idx:
        mid = item_features.index[idx]
        if exclude_seen and mid in exclude_seen:
            continue
        recs.append((mid, movies_df.loc[mid,'title'], float(sims[idx])))
        if len(recs) >= top_k:
            break
    return recs


def recommend_collaborative(user_ratings, top_k=5, exclude_seen=None):
    """
    user_ratings: dict movieId->rating for the current user
    We'll compute predicted scores as weighted sum of item similarities.
    """
    n_items = item_item_sim.shape[0]
    scores = np.zeros(n_items)
    counts = np.zeros(n_items)
    movie_index_map = {mid:i for i,mid in enumerate(item_features.index)}
    for mid, r in user_ratings.items():
        if mid not in movie_index_map:
            continue
        i = movie_index_map[mid]
        sims = item_item_sim[i]
        scores += sims * r
        counts += np.abs(sims)
    with np.errstate(divide='ignore', invalid='ignore'):
        preds = np.where(counts>0, scores / counts, 0)
    ranked_idx = np.argsort(preds)[::-1]
    recs = []
    for idx in ranked_idx:
        mid = item_features.index[idx]
        if exclude_seen and mid in exclude_seen:
            continue
        recs.append((mid, movies_df.loc[mid,'title'], float(preds[idx])))
        if len(recs) >= top_k:
            break
    return recs

# %%
# 7. Conversational chatbot class
class RecommenderChatbot:
    def __init__(self, movies_df, item_features, ratings_df, user_item):
        self.movies_df = movies_df
        self.item_features = item_features
        self.ratings_df = ratings_df
        self.user_item = user_item
        self.user_profile = None
        self.current_user_ratings = {}  # movieId->rating
        self.seen = set()

    def ask_preferences(self):
        print("Hi — I'm your recommendation assistant. I'll ask a few questions to learn your taste.")
        # ask genres
        available_genres = list(mlb.classes_)
        print("Available genres:", ', '.join(available_genres))
        g_input = input("Which genres do you like? (comma-separated, e.g. Sci-Fi,Drama)\n> ")
        preferred_genres = [g.strip() for g in g_input.split(',') if g.strip()]
        # ask liked titles
        print("Here are some sample titles:")
        print(', '.join(self.movies_df['title'].sample(min(6, len(self.movies_df))).tolist()))
        t_input = input("Any titles from above (or your favorites) you liked? (comma-separated)\n> ")
        liked_titles = [t.strip() for t in t_input.split(',') if t.strip()]
        self.user_profile = build_user_profile_from_preferences(preferred_genres, liked_titles)
        # ask for explicit ratings of a couple of items to bootstrap collaborative
        print("Great. I'll ask you to rate 3 movies (1-5) to personalize further.")
        sample_ids = list(self.movies_df.sample(6).index)
        for mid in sample_ids[:3]:
            title = self.movies_df.loc[mid,'title']
            r = input(f"Rate '{title}' from 1-5 (or press Enter to skip):\n> ")
            if r.strip().isdigit():
                rnum = int(r.strip())
                self.current_user_ratings[mid] = rnum
                self.seen.add(mid)
        print("Thanks — I learned from your answers.")

    def show_recommendations(self, method='hybrid', top_k=5):
        # hybrid: average normalized scores from content-based and collaborative
        content_scores = None
        collab_scores = None
        if self.user_profile is None:
            print("No profile yet — please run ask_preferences()")
            return
        # content
        c_recs = recommend_content_based(self.user_profile, top_k=top_k*3, exclude_seen=self.seen)
        # collaborative
        c_fallback = recommend_collaborative(self.current_user_ratings, top_k=top_k*3, exclude_seen=self.seen)
        # build score dicts
        content_dict = {mid:score for mid,_,score in c_recs}
        collab_dict = {mid:score for mid,_,score in c_fallback}
        # get union
        union = set(list(content_dict.keys()) + list(collab_dict.keys()))
        final_scores = {}
        for mid in union:
            cs = content_dict.get(mid, 0)
            ls = collab_dict.get(mid, 0)
            # normalize each score set (rough min-max)
            final_scores[mid] = 0.6*cs + 0.4*ls
        ranked = sorted(final_scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
        print(f"\nTop {top_k} recommendations for you (title — score):")
        for mid, score in ranked:
            print(f"- {self.movies_df.loc[mid,'title']}  —  {score:.3f}")
        return ranked

    def feedback_loop(self):
        while True:
            action = input("Would you like another recommendation, rate a recommended movie, or quit? (recommend/rate/quit)\n> ")
            if action.startswith('r') and 'rate' in action:
                # show last recs and ask to rate
                mid_s = input("Enter movieId you want to rate (or title):\n> ")
                try:
                    mid = int(mid_s)
                except:
                    # lookup by title
                    titles = self.movies_df[self.movies_df['title'].str.contains(mid_s, case=False)]['title']
                    if len(titles)>0:
                        # pick first
                        title = titles.iloc[0]
                        mid = int(self.movies_df[self.movies_df['title']==title].index[0])
                    else:
                        print("Couldn't find the movie.")
                        continue
                r = input("Rating 1-5:\n> ")
                if r.strip().isdigit():
                    rnum = int(r.strip())
                    self.current_user_ratings[mid] = rnum
                    self.seen.add(mid)
                    print("Thanks — updated your profile. Here's a new list:")
                    self.show_recommendations()
            elif action.startswith('r') and 'recommend' in action or action=='recommend' or action.startswith('rec'):
                self.show_recommendations()
            elif action.startswith('q') or action=='quit':
                print("Goodbye! Save preferences? (yes/no)")
                s = input('> ')
                if s.lower().startswith('y'):
                    # save to disk
                    with open('user_profile.pkl','wb') as f:
                        pickle.dump({'profile':self.user_profile,'ratings':self.current_user_ratings}, f)
                    print("Saved to user_profile.pkl")
                break
            else:
                print("I didn't get that. Please answer 'recommend', 'rate', or 'quit'.")

# %%
# 8. Demo run
bot = RecommenderChatbot(movies_df, item_features, ratings_df, user_item)

print("--- Quick demo: run ask_preferences() to start a conversational flow. ---")
print("Example: bot.ask_preferences()  then bot.show_recommendations() then bot.feedback_loop()")

# %%
# 9. Utilities: how to plug in MovieLens (instructions)
"""
If you want to use a real dataset such as MovieLens 100k or 1M, do the following:
1. Download the dataset files (e.g., ratings.csv, movies.csv).
2. Load movies.csv and ratings.csv into pandas DataFrames.
3. Preprocess movies: extract genres into lists and descriptions if available.
4. Replace the synthetic movies_df, ratings_df, and recompute item_features and item-item similarities.

Example snippet:

movies_real = pd.read_csv('movies.csv')  # contains movieId,title,genres
movies_real['genres'] = movies_real['genres'].apply(lambda s: s.split('|'))
ratings_real = pd.read_csv('ratings.csv')  # userId,movieId,rating,timestamp

# then build item_features similarly using MultiLabelBinarizer + TF-IDF on descriptions if you have them.
"""

# EOF


--- Quick demo: run ask_preferences() to start a conversational flow. ---
Example: bot.ask_preferences()  then bot.show_recommendations() then bot.feedback_loop()


"\nIf you want to use a real dataset such as MovieLens 100k or 1M, do the following:\n1. Download the dataset files (e.g., ratings.csv, movies.csv).\n2. Load movies.csv and ratings.csv into pandas DataFrames.\n3. Preprocess movies: extract genres into lists and descriptions if available.\n4. Replace the synthetic movies_df, ratings_df, and recompute item_features and item-item similarities.\n\nExample snippet:\n\nmovies_real = pd.read_csv('movies.csv')  # contains movieId,title,genres\nmovies_real['genres'] = movies_real['genres'].apply(lambda s: s.split('|'))\nratings_real = pd.read_csv('ratings.csv')  # userId,movieId,rating,timestamp\n\n# then build item_features similarly using MultiLabelBinarizer + TF-IDF on descriptions if you have them.\n"