# Data Science Exploration of Physics Concepts using Linear Regression

## Objective


Investigate the application and limitations of linear regression in estimating physical constants, focusing on data science techniques rather than deep physics knowledge.


## Your Task

Your task is to explore these relationships from a data science perspective, focusing on how linear regression performs in each case.

## Tools

Python with libraries for data analysis and visualization (e.g., pandas, numpy, matplotlib, seaborn, sklearn)

## Background for Data Scientists


We'll explore two physics scenarios:




### The Photoelectric Effect:

When light shines on certain materials, electrons are emitted. This is called the photoelectric effect.  
The maximum kinetic energy (K) of emitted electrons is related to the frequency of light (f) by the equation:

$ K = hf - φ $

Here, **h** is Planck's constant (a fundamental constant of nature), and **φ** is the work function (a property of the material).  
This relationship is linear, with **h** as the slope and **-φ** as the y-intercept.



### The Simple Pendulum:

A simple pendulum is a weight hanging from a string that swings back and forth.  
The time it takes for one complete swing (the period, T) is related to the length of the pendulum (L) by the equation:

$ T = 2\pi\sqrt{\frac{L}{g}} $

Here, **g** is the acceleration due to gravity (approximately 9.8 m/s² on Earth).  
This relationship is not linear due to the square root.


# Data Generation

## The Photoelectric Effect (Linear Relationship):

### a. Choose your parameters:

- **h (Planck's constant):** Use the actual value, $( 6.626 \times 10^{-34} ) J⋅s$
- **φ (work function):** Choose a value between $( 1 \times 10^{-19})$ and $( 5 \times 10^{-19} ) J$

### b. Generate data:

1. **Create an array of 50 frequency (f) values** between $( 1 \times 10^{14} )$ and $( 1 \times 10^{15} )$ Hz.
2. **Calculate the corresponding K values** using the equation $( K = hf - φ )$.
3. **Add random noise to K values** to simulate experimental uncertainty. Use numpy's random module to add Gaussian noise with a standard deviation of about 1-5% of the K values.

### Python snippet to get you started:


In [None]:
import numpy as np

# Set your parameters
h = 6.626e-34  # Planck's constant in J⋅s
phi = # Choose a value for the work function

# Generate frequency values
f = np.linspace(1e14, 1e15, 50)

# Calculate K values
K = h * f - phi

# Add noise
noise_level = # Choose a value between 0.01 and 0.05
K_with_noise = K + np.random.normal(0, noise_level * K, K.shape)

## The Simple Pendulum (Non-linear Relationship):

### a. Choose your parameters:

- **g (acceleration due to gravity):** Use 9.8 m/s²

### b. Generate data:

1. **Create an array of 50 length (L) values** between $(0.1)$ and $(2)$ meters.
2. **Calculate the corresponding T values** using the equation $(T = 2\pi\sqrt{\frac{L}{g}})$.
3. **Add random noise to T values** to simulate experimental uncertainty. Use numpy's random module to add Gaussian noise with a standard deviation of about 1-5% of the T values.


In [None]:
import numpy as np

# Set your parameters
g = 9.8  # acceleration due to gravity in m/s²

# Generate length values
L = np.linspace(0.1, 2, 50)

# Calculate T values
T = 2 * np.pi * np.sqrt(L / g)

# Add noise
noise_level = # Choose a value between 0.01 and 0.05
T_with_noise = T + np.random.normal(0, noise_level * T, T.shape)

For both scenarios:

Create pandas DataFrames to store your generated data.
Visualize your data using scatter plots. Use matplotlib or seaborn for this.
Examine your plots. How do they differ? What do these differences suggest about the underlying relationships?

Remember, while we're providing a structure for data generation, you should experiment with different noise levels and ranges to understand how these choices affect your subsequent analysis.

# Part II: Linear Regression Application
For both scenarios:

1. Plot the Data (matplotlib & seaborn)
2. Apply linear regression to your data. (use sklearn)
3. Visualize the regression line alongside your data points. (matplotlib & seaborn)
4. Extract the slope and intercept. What do these represent in each scenario? (what is the slope?)



## Questions to consider:

+ How well does the linear model fit each dataset?
+ What metrics could you use to quantify the goodness of fit?
+ For the pendulum data, is linear regression appropriate? If not, why?

### Plot Data 

In [None]:
### Plot Visualizations of the Photoelectric Data







In [None]:
### Plot VIsualizations for the Pendulum Movement





### Regression

In [None]:
# Regression for Photelectric Effect








In [None]:
# Regression for Pendulum








### Visualization of the Regression Results

In [None]:
### Code for visualizing regression results (Photoelectric)





In [None]:
### Code for visualizing regression results (Pendulum)




