The bias-variance tradeoff is a fundamental concept in machine learning and statistics that deals with the problems of overfitting and underfitting.

Bias: This refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. Bias measures how far off in general these models' predictions are from the correct value. A high-bias model is overly simplistic — it does not learn enough from the training data, leading to errors in both training and unseen data. This is known as underfitting.

Variance: Variance refers to the model's sensitivity to the fluctuations in the training dataset. A model with high variance pays a lot of attention to training data and does not generalize well on the data it hasn’t seen before. Such models perform well on training data but have high error rates on test data. This is known as overfitting.

The tradeoff:

A model with high bias and low variance will oversimplify, resulting in systematic errors regardless of how much data we feed it.
A model with high variance and low bias is overly complex, fitting the training data too closely and failing to generalize well to new data.
The goal in most machine learning models is to find a good balance between bias and variance, minimizing the total error. This tradeoff is a key aspect of the model selection process — choosing not only the right algorithms but also the correct configuration (like the degree of a polynomial in regression, the depth of a decision tree, or the number of layers in a neural network) that provides the right level of model complexity.

A common way to address the bias-variance tradeoff is through techniques such as cross-validation, where the model's performance is tested on multiple subsets of data to ensure it generalizes well, or regularization, which adds a penalty to more complex models to avoid overfitting.

Question 1.b Live programing


'''
Level 1 - Programming
You are a data scientist for a large retail organization. Currently your task involves mining for insights from Point-of_Sale systems which track item sales. You have data stored
as follos in JSON files (for simplicity think that you have one file 'pos_data.json')
Data Sample:
[{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Fosters Mild', 'price': '$12.99'},
{'item': 'Heineken', 'price': '$29.45'},
{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Fosters Mild', 'price': '$12.99'}]
Your tasks are as follows.
1. Write code to read in the JSON file to an appropriate python data structure (to solve Q2)
2. Use base python libraries (not pandas) to get the following insights
 -> Item name with top total sales
 -> Item name with least total sales
 -> Item name with the most units sold
Base Python libraries only (no 3rd party libraries e.g: numpy, pandas)
'''

In [None]:
import json 

f = open('pos_data.json')
sales = json.load(f)



Note : practice opening different type of files 

In [4]:
sales = [{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Fosters Mild', 'price': '$12.99'},
{'item': 'Heineken', 'price': '$29.45'},
{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Stella Extra Strong', 'price': '$23.45'},
{'item': 'Fosters Mild', 'price': '$12.99'}]

1. Write code to read in the JSON file to an appropriate python data structure (to solve Q2)

In [5]:
#Converting price to float

for item in sales:
    print(item['price'])
    

$23.45
$12.99
$29.45
$23.45
$23.45
$12.99


In [13]:
#Create a list of all the items

items_list = [item['item'] for item in sales]

In [14]:
#Create a set of all the items
items_set = [set(item['item']) for item in sales]