# Getting Started with Jupyter Notebooks in VS Code

Welcome to this tutorial! Today, we'll learn how to set up and use Jupyter Notebooks in Visual Studio Code for local development without internet access.

## Prerequisites
- Visual Studio Code installed
- Basic understanding of Python

## Installing Required Extensions (VSIX)

For working with Jupyter Notebooks locally, you'll need to install these VS Code extensions:

1. **Python Extension**: Provides Python language support
2. **Jupyter Extension**: Supports Jupyter notebook functionality
3. **Jupyter Keymap**: Adds Jupyter keyboard shortcuts
4. **Jupyter Notebook Renderers**: Enhances notebook output rendering

### How to Install VSIX files:
1. Open VS Code
2. Go to Extensions view (Ctrl+Shift+X)
3. Click on the "..." menu (top-right of Extensions view)
4. Select "Install from VSIX..."
5. Navigate to the folder containing your VSIX files
6. Select and install each required extension

**Note**: These extensions are pre-downloaded for you since you're working in an offline environment.

## 1- Your First Jupyter Notebook

Let's run a simple "Hello World" example to ensure everything is working correctly.

In [None]:
# Your first Jupyter code cell
print("Hello, Jupyter World!")

# Let's check Python version
import sys
print(f"Python version: {sys.version}")

### Note on Execution Results

If the Hello World example failed to run, don't worry! This is normal at this stage. Currently, there might not be a proper connection between your Python installation and the Jupyter Notebook environment.

We'll explain below how to properly set up a Jupyter Notebook with a Python environment. This setup ensures that:
1. Your notebook can find and use your Python installation
2. Dependencies are managed properly
3. Your work is reproducible across different machines

**Important:** If you were able to run the Hello World example using your basic Python installation without relying on a virtual environment, it means your packages are installed at the root level. While this works, it may create conflicts in the future as different projects might require different versions of the same packages.

The virtual environment setup we'll cover next is considered a best practice for Python development.

## 2- Working with Virtual Environments

A virtual environment is an isolated Python environment that allows you to install packages without affecting your system Python installation.

### Creating a Virtual Environment
Run this in the VS Code terminal:

In [None]:
# Create a virtual environment in a venv folder
python -m venv <name_my_venv> # Replace <name_my_venv> with your desired name, e.g., venv_myproject

# Activating the virtual environment:
# On Windows
.\venv\Scripts\activate
# On macOS and Linux
source venv/bin/activate

### Understanding the Virtual Environment Indicator

After activating your virtual environment, you'll notice the name of your environment (e.g., `(venv_myproject)`) appears in green at the beginning of your command prompt. This is an important visual indicator that:

1. You are now working within the isolated Python environment you created
2. Any packages you install with pip will only be installed in this environment
3. Python will use only the libraries installed in this environment, not your global Python installation

This green indicator helps you avoid confusion about which Python environment you're currently using. If you don't see this indicator, your commands will affect your system-wide Python installation instead of your project-specific environment.

Remember: Always check for this indicator before installing packages or running Python code to ensure you're working in the intended environment.

## 3- Installing Python Packages in Your Virtual Environment

Now that we have our virtual environment created and activated (indicated by the green environment name in the terminal), we can install Python packages that will be isolated to this environment.

### Benefits of Installing in a Virtual Environment:
- Packages won't interfere with your global Python installation
- Different projects can use different package versions
- Dependencies are clearly documented and reproducible
- You can easily share your environment configuration with others

### Installing Packages with pip:
The standard tool for installing Python packages is pip. With your virtual environment activated, any packages you install will be contained within that environment.

In the next cell, we'll install and verify the following packages.

1. **pandas**: A powerful data manipulation and analysis library
2. **notebook**: Core components for Jupyter notebooks functionality
3. **pandasql**: SQL toolkit for pandas

These packages provide the foundation for most data science work, from data loading and cleaning to visualization and database interaction.

In [None]:
# Install pandas, notebook, and pandasql packages
pip install pandas notebook pandasql

# Verify the installation by checking the installed packages
pip freeze

# Create a requirements.txt file with the installed packages and their versions
pip freeze > requirements.txt

## 4- Now Let's Finally Use Jupyter Notebook in VS Code

1. **Select your Python interpreter (virtual environment)**:
    - Click on the Python environment selector in the bottom status bar of VS Code
    - OR click on "Select Kernel" in the top-right corner of your notebook
    - Choose the virtual environment you created earlier from the dropdown list

2. **Run your first notebook cell**:
    - Click the "Play" button to the left of the cell below
    - OR use the keyboard shortcut: Shift+Enter

This will activate your virtual environment within the notebook context, giving you access to all the packages you installed.

**Note**: You'll know your virtual environment is active when you see its name displayed in the kernel selector at the top-right of the notebook.

Now you're ready to start working with your data in an isolated, reproducible environment!

In [30]:
# Read the requirements.txt file with pandas - handling encoding correctly
import pandas as pd

# Read the file as a text file with pandas
# Specify encoding to avoid UnicodeDecodeError
# header=None is used to avoid treating the first line as a header
df = pd.read_csv('requirements.txt', encoding='UTF-16', header=None)

# Display the first few rows
df.head()

Unnamed: 0,0
0,anyio==4.9.0
1,argon2-cffi==25.1.0
2,argon2-cffi-bindings==21.2.0
3,arrow==1.3.0
4,asttokens==3.0.0


## 5- Data Engineering Preparation

Now we'll run a small parser code to make our data more data engineering friendly. This will allow us to analyze our package versions more effectively.

### Data Wrangler Extension

To get a more interactive data manipulation experience:

1. Install the **Data Wrangler VSIX** extension for VS Code
    - Follow the same VSIX installation steps mentioned earlier
    - This extension provides an interactive UI for data transformation

2. After installation:
    - Rerun this notebook
    - Look for the "Open df in Data Wrangler" button that will appear above the DataFrame outputs
    - Click this button to open an interactive view where you can filter, transform, and visualize the data without writing code

This tool is especially helpful for exploratory data analysis and quick transformations!

In [31]:
# Parse package information into separate columns
# First, split the package name and version
df[['package', 'version']] = df[0].str.split('==', expand=True)

# Split the version into components
version_parts = df['version'].str.split('.', expand=True)
df['major_version'] = version_parts[0]
df['minor_version'] = version_parts[1]
df['patch_version'] = version_parts[2]

# Display the transformed DataFrame
df.head()

Unnamed: 0,0,package,version,major_version,minor_version,patch_version
0,anyio==4.9.0,anyio,4.9.0,4,9,0
1,argon2-cffi==25.1.0,argon2-cffi,25.1.0,25,1,0
2,argon2-cffi-bindings==21.2.0,argon2-cffi-bindings,21.2.0,21,2,0
3,arrow==1.3.0,arrow,1.3.0,1,3,0
4,asttokens==3.0.0,asttokens,3.0.0,3,0,0


In [32]:
# Import pandasql to run SQL queries on pandas DataFrames
from pandasql import sqldf

# Define a function to use SQL with pandas
def pysqldf(q):
    return sqldf(q, globals())

# Write a SQL query to count the number of packages
query = """
SELECT COUNT(*) AS package_count
FROM df
"""

# Execute the query
result = pysqldf(query)

# Display the result
print(f"Total number of packages: {result['package_count'].iloc[0]}")

# Get the count of packages by major version
version_query = """
SELECT major_version, COUNT(*) AS count
FROM df
GROUP BY major_version
ORDER BY count DESC
LIMIT 10
"""

# Execute the version query
version_count = pysqldf(version_query)
print("\nTop 10 major versions by package count:")
print(version_count)

Total number of packages: 100

Top 10 major versions by package count:
  major_version  count
0             0     25
1             2     18
2             1     14
3             3     10
4             4      6
5             7      4
6             6      4
7             5      4
8          2025      4
9            25      3


## 6- Let's Visualize Package Distributions with Plotly

Before we can run the visualization in the next cell, we need to make sure the `plotly` package is installed in our virtual environment:

1. **Install Plotly**:
    - Go back to cell 9 where we installed packages
    - Add plotly to your local virtual environment: `pip install plotly`
    - Run that cell again to install plotly

2. **Update Requirements File**:
    - After installation, rerun the cell that generates requirements.txt
    - This will ensure plotly is documented in your project dependencies

3. **Refresh Data**:
    - Rerun the cells that load and process the requirements.txt file (cells 11-14)
    - This will ensure your data includes the newly installed plotly package

4. **Check the Data**:
    - Consider examining the `version_count` DataFrame in Data Wrangler before visualization
    - Click the "Open in Data Wrangler" button that appears above the DataFrame output
    - This gives you an interactive way to explore the data before plotting

Once these steps are complete, you'll be ready to run the visualization cell below to see the distribution of packages by major version!

In [33]:
# Import plotly
import plotly.express as px

# Create a bar chart to visualize the distribution of packages by major version
fig = px.bar(
    version_count,
    x='major_version',
    y='count',
    title='Distribution of Packages by Major Version',
    labels={'major_version': 'Major Version', 'count': 'Number of Packages'},
    color='count',
    color_continuous_scale='Viridis'
)

# Improve layout
fig.update_layout(
    xaxis_title='Major Version',
    yaxis_title='Number of Packages',
    xaxis={'categoryorder':'total descending'}  # Sort bars by count
)

# Display the figure
fig.show()

## 7- Exporting Your Analysis Results

Now that we've analyzed and visualized our package data, let's export the results so they can be shared or used outside of this notebook:

1. **CSV Export**: Save the version count data in a CSV file for use in other applications
2. **HTML Export**: Export the interactive visualization as a standalone HTML file

These exports will be saved in your current working directory.

In [None]:
# Export the version count data to CSV
csv_file_path = 'package_version_counts.csv'
version_count.to_csv(csv_file_path, index=False)
print(f"Version count data exported to: {csv_file_path}")

# Export the Plotly figure to HTML
html_file_path = 'package_version_visualization.html'
fig.write_html(html_file_path)
print(f"Interactive visualization exported to: {html_file_path}")


## Conclusion and Next Steps

You've now successfully:

1. Set up a Jupyter Notebook environment in VS Code
2. Created and activated a virtual environment
3. Installed essential data science packages
4. Verified your installation works correctly
5. Created your first pandas DataFrame
6. Used SQL to query your data
7. Use Datawrangler/Plotly to dynamically visualize your data
8. Export and save your data analysis in different formats

### What's Next?

Now that you have a working environment, you can:

1. **Import your own data**: Use pandas to read CSV, Excel, or database files
2. **Create visualizations**: Try matplotlib, seaborn, or plotly for data visualization
3. **Build data pipelines**: Develop reusable data cleaning and transformation processes
4. **Share your work**: Export your notebooks or create reports from your analyses

Remember to keep your virtual environment activated when working on this project!

Happy data analyzing! 📊📈