
# Analytics Programming and Data Visualisation Class Test
**Name:** Matta Akhil  
**Student ID:** 23389605  
**Date:** 27/02/2025  
**Course:** Analytics Programming and Data Visualisation (MSCDAD_B_JAN25I)


## Question-1

a) Create a function that accepts two arguments. The first argument is the name of the file you will try to load/parse,
and the second argument is an integer that indicates the number of Nobel Prizes in a specific category and year (for
example, Physics in 2006). The default value is set to 50. Ideally, you should retrieve a random sample of that size.
The function should return a data structure (such as a list) that contains the specified number of random Nobel Prizes
in the given category and year. Your function should include appropriate exception handling clauses to recover from
various common problems. Explain the handlers used and the effect of their declaration order. For this task, you are
not allowed to use the Pandas library.

In [62]:
import numpy as np
import json
import random
import csv


def my_json_function(file_name, categeory_year, num_prizes=50):
    """This function reads the data from provided file and calls get_filtered_data"""
    try:
        with open(file_name) as f:
            data = json.load(f)
        get_filtered_data(data["prizes"], categeory_year, num_prizes)
    except FileNotFoundError:
            # Handling File not Found Error
        return f"{file_name} is not found."
    except json.JSONDecodeError:
            # Handling Json Decode Error
        return f"Failed to decode JSON from the file '{file_name}.'"
    except KeyError:
        # Handling missing keys in the data
        print(" Missing 'category' or 'year' in the categeory_year.")
    except ValueError:
        # Handling value error
        print("Error: Requested sample size is larger than the available data.")
    except Exception as e:
        return f"Unexpected error occured: {e}."

def get_filtered_data(data, categeory_year, num_prizes):
    """function returns a data structure (such as a list) that contains the specified number of random Nobel Prizes"""
    filtered_data = []
    try:
        for each in data:
            if each["category"] == categeory_year[0] and each["year"] == str(categeory_year[1]):
                filtered_data.append(each)
        # Step 3: Return a random sample of the specified number of prizes
        if len(filtered_data) < num_prizes:
            print(f"Only {len(filtered_data)} Nobel Prizes found for the given category and year.")
        return random.sample(filtered_data, min(len(filtered_data), num_prizes))
    except KeyError:
        raise
    except ValueError:
        raise
    except Exception as e:
        raise




my_json_function("./nobel_prizes.json", ["physics", 2006])

Only 1 Nobel Prizes found for the given category and year.


b) values
Using a loop structure, find the Nobel prizes after or equal to 1950 on physics and based upon the following
1. year
2. category
3. laureates
print the following formatted message with the winners
2006 physics
1) John C. Mather in "for their discovery of the blackbody form and anisotropy of the cosmic microwave
background radiation"
2) George F. Smoot in "for their discovery of the blackbody form and anisotropy of the cosmic microwave
background radiation"
1986 physics
1) Ernst Ruska in "for his fundamental work in electron optics, and for the design of the first electron microscope"
2) Gerd Binnig in "for their design of the scanning tunneling microscope"
3) Heinrich Rohrer in "for their design of the scanning tunneling microscope"
….

In [63]:
def my_json_retrieve_function(file_name):
    """This function reads the data from provided file and calls get_filtered_data"""
    try:
        with open(file_name) as f:
            data = json.load(f)
        get_filtered_data(data["prizes"])
    except FileNotFoundError:
        # Handling File not Found Error
        return f"{file_name} is not found."
    except json.JSONDecodeError:
        # Handling Json Decode Error
        return f"Failed to decode JSON from the file '{file_name}.'"
    except Exception as e:
        # Handling Common Exception
        return f"Unexpected error occured: {e}"

def get_filtered_data(data):
    try:
        for each in data:
            if each["category"] == "physics" and each["year"] >= str(1950):
                print(each['year']," ", each["category"])
                laureates = each['laureates']
                for i in laureates:
                    print(f"{i['firstname']}{i['surname']} in '{i['motivation']}'")
    except Exception as e:
        raise

my_json_retrieve_function("./nobel_prizes.json")

2024   physics
JohnHopfield in 'for foundational discoveries and inventions that enable machine learning with artificial neural networks'
GeoffreyHinton in 'for foundational discoveries and inventions that enable machine learning with artificial neural networks'
2023   physics
PierreAgostini in 'for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter'
FerencKrausz in 'for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter'
AnneL’Huillier in 'for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter'
2022   physics
AlainAspect in 'for experiments with entangled photons, establishing the violation of Bell inequalities and  pioneering quantum information science'
John Clauser in 'for experiments with entangled photons, establishing the violation of Bell inequalities and  pioneering quantum information science'
AntonZeilinger in

c) Using a loop structure, extract the following information from your random data structure (step a)
1. year
2. category
3. laureates
and write it to a csv file using the following structure and column names.
YEAR, CATEGORY, LAUREATES
1973, physics, Leo Esaki & Ivar Giaever & Brian D. Josephson
For this task you can reuse previous steps.
Note, you are not allowed to use the pandas library.

In [64]:
def my_json_retrieve_function(file_name):
    """This function reads the data from provided file and saves required data into csv file"""

    try:
        with open(file_name) as f:
            data = json.load(f)
        convert_to_csv(data["prizes"])
    except FileNotFoundError:
        return f"file: {file_name} passed is not found"
    except json.JSONDecodeError:
        return f"Error: Failed to decode JSON from the file '{file_name}'. Please check the file format."
    except Exception as e:
        return f"Unexpected error occured: {e}"


def convert_to_csv(data):
    # Open CSV file for writing
    with open('nobel_laureates.csv', mode='w', newline='') as file:
        writer = csv.writer(file)

        # Write the header
        writer.writerow(['YEAR', 'CATEGORY', 'LAUREATES'])

        # Loop through the data
        for record in data:
            year = record['year']
            category = record['category']

            # Combine the laureates' names into a single string
            laureates = " & ".join(
                [f"{laureate.get('firstname', 'N/A')} {laureate.get('surname', 'N/A')}" for laureate in
                 record.get('laureates', [])]
            )

            # Write the row to the CSV
            writer.writerow([year, category, laureates])

    print("Data written to 'nobel_laureates.csv'")


my_json_retrieve_function("./nobel_prizes.json")

Data written to 'nobel_laureates.csv'


## Question-2

Create a pandas data frame by loading the provided Football Team Stats.csv file. The data of the file looks like this
a) Use pandas to find the Squads where the Attendance in less than 10000 and greater than 50000.

In [65]:
import pandas as pd

df = pd.read_csv("./Football Team Stats.csv")

squads = df[(df['Attendance'] < 10000) | (df['Attendance'] > 50000)]
data = {'Squad': squads["Squad"],
        'Attendance': squads["Attendance"]}
data = pd.DataFrame(data)
data

Unnamed: 0,Squad,Attendance
0,Barcelona,83148
3,Arsenal,60203
4,Manchester City,53203
5,Real Madrid,57300
6,Dortmund,81199
7,Atletico Madrid,56432
9,Marseille,58623
10,Bayern Munich,75000
13,Manchester Utd,73704
14,Monaco,6498


b) Use pandas to find for each Country the Squad with maximum wins (W)

In [66]:
# Find the squad with the maximum wins for each country
max_wins_df = df.loc[df.groupby('Country')['W'].idxmax()]
print("The squads which have max wins for each country are:")
max_wins_df

The squads which have max wins for each country are:


Unnamed: 0,Rk,Squad,Country,LgRk,MP,W,D,L,GF,GA,GD,Pts,Attendance
3,4,Arsenal,ENG,1,32,23,6,3,77,34,43,75,60203
0,1,Barcelona,ESP,1,29,23,4,2,53,9,44,73,83148
2,3,Paris S-G,FRA,1,32,24,3,5,75,31,44,75,40508
6,7,Dortmund,GER,1,29,19,3,7,66,39,27,60,81199
1,2,Napoli,ITA,1,30,24,3,3,66,21,45,75,25662


## Question-3

a) Create a NumPy array containing all numbers between -100 (included) and 100 (excluded) and with a step factor
of 10. Next, modify the array so that the data is organized as a 2-dimensional array where the number of rows
are 5.

In [67]:

arr = np.arange(-100, 100, 10).reshape(5, -1)
print(arr)

[[-100  -90  -80  -70]
 [ -60  -50  -40  -30]
 [ -20  -10    0   10]
 [  20   30   40   50]
 [  60   70   80   90]]


b) Create a 2D NumPy array (4x4) and slice it to extract the submatrix that includes rows 1 and 2 and columns 2
and 3.

c) Create a 2D NumPy array and find the maximum values along both axis (both row-wise and column-wise).

In [68]:
arr = np.arange(1, 5).reshape(2, 2)
print(arr.max())

4


## Question-4

a) Find all words in text with exactly two or three letters. Test your function with different text inputs.

In [69]:
import re

pattern = r"\b[a-zA-Z]{2,3}\b"
test_texts = [
    "This is a test string with short words and longer words.",
    "Python is great for data analysis and machine learning.",
]

for each in test_texts:
    print(re.findall(pattern, each))

['is', 'and']
['is', 'for', 'and']


b) Create a python program that would extract repeated (duplicate) words from a sentence. The words are
separated with a space or hyphen (-). For example, the following are valid matches
It is also used at the start of every knock knock joke of which there are many.
It is also used at the start of every knock-knock joke of which there are many.
Test your function with different text inputs.