# Web Mining and Applied NLP (CSIS 44-620)

## P2: Employ Python Data Structures, Notebooks & Engage

### 
Author: Data-Git-Hub <br>
GitHub Project Repository Link: https://github.com/Data-Git-Hub/python-ds <br>
4 July 2025 <br>

### Introduction
In this assignment, I apply foundational Python skills related to data structures and notebook-based workflows. Using Jupyter Notebooks in VS Code, I complete a series of exercises that demonstrate proficiency with lists, dictionaries, tuples, and sets, while also exploring functions and basic control flow. The project emphasizes clear documentation using Markdown, code execution, and exporting work to HTML, reinforcing best practices for reproducibility and communication in data analytics. <br>

### Tasks
Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository. <br>

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question. <br>

Do not use external modules (`math`, etc) for this assignment unless you are explicitly instructed to, though you may use built in python functions (`min`, `max`, etc) as you wish. <br>

---

#### Section 1. 

Modify the Markdown cell above to put your name after "Student Name:"; you will be expected to do this in all assignments presented in this format for this class. <br>

---

#### Section 2. 

Write code that divides any two numbers, stores the result in a variable, and prints the result with an appropriate label. <br>

In [44]:
# Divide two numbers
numerator = 12
denominator = 4

result = numerator / denominator

# Print result with a label
print(f"The result of dividing {numerator} by {denominator} is {result}")

The result of dividing 12 by 4 is 3.0


#### Section 3. 

Using loops (and potentially conditionals), write Python code that prints the factorial of each integer from 1 through 10 (which you can store in a variable if you want). The factorial of an integer is the product of all of the integers of 1 through the number. Print the result with an appropriate label. <br>

In [45]:
# Calculate and print factorials from 1 to 10
for i in range(1, 11):
    factorial = 1
    for j in range(1, i + 1):
        factorial *= j
    print(f"The factorial of {i} is {factorial}")

The factorial of 1 is 1
The factorial of 2 is 2
The factorial of 3 is 6
The factorial of 4 is 24
The factorial of 5 is 120
The factorial of 6 is 720
The factorial of 7 is 5040
The factorial of 8 is 40320
The factorial of 9 is 362880
The factorial of 10 is 3628800


#### Section 4. 

Write a python function that takes a single parameter and calculates and returns the average (mean) of the values in the parameter (which you may assume is iterable).  Show that your function works by printing the result of calling the function on the list in the cell below. <br>

In [46]:
# Define a function to calculate the mean
def calculate_mean(values):
    return sum(values) / len(values)

# Test the function
testlist = [1, -1, 2, -2, 3, -3, 4, -4]
mean_result = calculate_mean(testlist)

# Print the result
print(f"The mean of the list is {mean_result}")

The mean of the list is 0.0


#### Section 5. 

Using your mean function above, write a function that calculates the variance of the list of numbers (see https://en.wikipedia.org/wiki/Variance for more information on the formula). In short: <br>

* subtract the mean of the elements in the list from every element in the list; store these values in a new list <br>
* square every element in the new list and sum the elements together <br>
* divide the resulting number by N (where N is the length of the original list) <br>

Show the result of calling your function in the lists in the code cell. You must use one or more list comprehensions or map/filter in your code. <br>


In [47]:
# Function to calculate variance using the previously defined calculate_mean
def calculate_variance(values):
    mean = calculate_mean(values)
    squared_diffs = [(x - mean) ** 2 for x in values]
    return sum(squared_diffs) / len(values)

# Test lists
list1 = [ 5.670e-1, -1.480e+0, -5.570e-1, -1.470e+0, 7.340e-1, 1.050e+0, 4.480e-1, 2.570e-1, -1.970e+0, -1.460e+0]
list2 = [-1.780e+0, 2.640e-1, 1.160e+0, 9.080e-1, 1.780e+0, 1.080e+0, 1.050e+0, -4.630e-2, 1.520e+0, 5.350e-1]

# Calculate and print variances
variance1 = calculate_variance(list1)
variance2 = calculate_variance(list2)

# the variances of both lists should be relatively close to 1 (off by less than .15)
print(f"Variance of list1: {variance1}")
print(f"Variance of list2: {variance2}")

Variance of list1: 1.13973309
Variance of list2: 0.9257232841


#### Section 6. 

Create a list with at least 15 elements in it. Use list slicing to print the following: <br>

* The first 5 elements of the list <br>
* The last 5 elements of the list <br>
* The list reversed (hint, show the entire list with a stride of -1) <br>
* Every second element in the list <br>
* Every third element in the list (stride of 3) <br>

In [48]:
# Manually defined list of the first 31 prime numbers
prime_numbers = [
     2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 
    31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
    73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127
]

# Print the first 5 elements
print("First 5 primes:", prime_numbers[:5])

# Print the last 5 elements
print("Last 5 primes:", prime_numbers[-5:])

# Print the list reversed
print("Reversed list:", prime_numbers[::-1])

# Print every second element
print("Every second prime:", prime_numbers[::2])

# Print every third element
print("Every third prime:", prime_numbers[::3])

First 5 primes: [2, 3, 5, 7, 11]
Last 5 primes: [103, 107, 109, 113, 127]
Reversed list: [127, 113, 109, 107, 103, 101, 97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]
Every second prime: [2, 5, 11, 17, 23, 31, 41, 47, 59, 67, 73, 83, 97, 103, 109, 127]
Every third prime: [2, 7, 17, 29, 41, 53, 67, 79, 97, 107, 127]


#### Section 7. 

Build a dictionary that contains the following information about this class (with appropriate names as keys): <br>

* The name <br>
* The course number <br>
* The semester/term in which you are taking this course <br>
* The number of credit hours this course counts for <br>
* A list of the course learning objectives <br>

The majority of this information can be found in the syllabus. Print the dictionary. <br>

In [49]:
class_info = {
    "name": "Web Mining & Applied Natural Language Processing (NLP)",
    "c_number": "44-620",
    "s_term": "Summer 2025",
    "c_hours": "3 Credits",
    "learn_obj": [
        "L01. Manage Python libraries and packages.",
        "L02. Interact with Hosted Version Control Systems (e.g. Git and GitHub).",
        "L03. Programmatically obtain and transform data from web-based APIs and HTML pages into a usable form.",
        "L04. Describe the steps in a basic Natural Language Processing Pipeline.",
        "L05. Use preexisting tools and software libraries to perform some Natural Language Processing, such as sentiment analysis.",
        "L06. Explain results and conclusions drawn from the visualized."
    ]
}

print(class_info)

{'name': 'Web Mining & Applied Natural Language Processing (NLP)', 'c_number': '44-620', 's_term': 'Summer 2025', 'c_hours': '3 Credits', 'learn_obj': ['L01. Manage Python libraries and packages.', 'L02. Interact with Hosted Version Control Systems (e.g. Git and GitHub).', 'L03. Programmatically obtain and transform data from web-based APIs and HTML pages into a usable form.', 'L04. Describe the steps in a basic Natural Language Processing Pipeline.', 'L05. Use preexisting tools and software libraries to perform some Natural Language Processing, such as sentiment analysis.', 'L06. Explain results and conclusions drawn from the visualized.']}


#### Section 8.  

Given the dictionary defined in the code cell below, print the list of level 3 spells the character has. <br>

In [50]:
player_character = {'name': 'Kitab',
                   'class': [('Cleric: Knowledge', 7)],
                   'spells': {'cantrip': ['Guidance', 'Light', 'Thaumaturgy', 'Toll the Dead', 'Word of Radiance'],
                             'level 1': ['Command', 'Detect Magic', 'Healing Word', 'Identify', 'Sleep'],
                             'level 2': ['Augury', 'Calm Emotions', 'Command', 'Invisibility', 'Lesser Restoration'],
                             'level 3': ['Mass Healing Word', 'Nondetection', 'Revivify', 'Feign Death', 'Speak with Dead'],
                             'level 4': ['Banishment', 'Confusion']}
                   }

# Access and print the list of level 3 spells
level_3_spells = player_character['spells']['level 3']
print("Level 3 Spells:", level_3_spells)

Level 3 Spells: ['Mass Healing Word', 'Nondetection', 'Revivify', 'Feign Death', 'Speak with Dead']


#### Section 9. 

Write code to determine the number of unique elements in the list below.  You MUST use a set in finding your solution.  Print the number of unique values in the list with an appropriate label. <br>

In [51]:
# List of values
values = [10, 11, 10, 8, 1, 12, 0, 1, 6, 5, 5, 13, 6, 15, 0, 0, 1, 1, 9, 7]

# Use set to find unique elements
unique_values = set(values)

# Print the number of unique values
print(f"Number of unique values: {len(unique_values)}")

Number of unique values: 12


#### Section 10. 

Create a new Jupyter Notebook (the name of the notebook should be your S number). Add a Markdown cell that contains your name. Add a Code cell and write Python that uses loops to draw the following pattern: <br>

```
*      *
**    **
***  ***
********
```

Make sure to add and submit both the new notebook and the changes to this notebook for this assignment. <br>