## For an unsolvable system of linear equations, Ax = b, there exists a best solution, x_hat, that is found by 
## solving the equation p = Ax_hat, where p is the vector resulting from the projection of b onto the column 
## space of A. Since e = b-p is orthogonal to the column space of A (i.e., belongs to the left-nullspace of A), 
## it follows that A'(b-Ax_hat) = 0 and A'Ax_hat = A'b, allowing us to solve for x_hat.

https://lightning.ai/pages/category/education/lightning-bits/ (EP 06-09)

<u>**Version Control Process**</u>
- put folder under source control 
- create .gitignore 
- add everything else 
- make and view changes 
- commit changes

<u>Creating Git Repository</u>
- go to CLI 
- navigate to working directory >> cd <path> 
- initialize git repo >> git init
- add files to track >> git add <file_name_1 file_name_2 ... file_name_n>
- commit changes >> git commit -am "<commit_message>"
- create .gitignore >> touch .gitignore
    - specify files to be ignored >> open .gitignore --> <file_names_to_ignore> --> save
- add .gitignore file (only untracked file in >> git status) >> git add .
- commit changes >> git commit -am "<commit_message>"

<u>Editing Files</u>
- make change to file in repo and save
- see that changes were made >> git status
    - see the differences >> git diff
- commit changes >> git commit -am "<commit_message>"

<u>Branching</u>
- view all branches and current branch >> git branch
- create branch >> git branch <created_branch_name>
- move to new branch >> git checkout <created_branch_name>
    - checkout <=> switch
- make changes in new branch, see changes and diff >> git status >> git diff
- commit changes >> git commit -am "<commit_message>"
- Note: at this point, two versions exist, one in main and one in new branch
- move changes back to main branch and delete created branch
    - move to main branch >> git checkout main
    - merge created branch >> git merge <created_branch_name>
        - could also rebase (approve changes commit by commit)
    - delete created branch >> git branch -D <created_branch_name>

<u>**GitHub**</u>
- create github repo
- link local git controlled repo
    - checks >> git status >> pwd
    - paste github instructions
- add collaborators
    - settings --> collaborators --> add people
- _Collaboration Workflow_
    - git clone <repo_link>
    - everyone create new branch for everything
    - make changes, add, commit
    - push to github >> git push origin <branch_name>
    - make pull request, get feedback
        - recipient of pull request:
            - review, suggest change, leave comment (github ui)
    - implement agreed upon changes from branch --> >> git add . >> git commit -am "" >> git push origin <branch_name>
        - repo owner upon recieval (github ui):
            - squash all commits into one: merge pull requests dropdown --> squash and merge
            - delete branch 



In [1]:
import pandas as pd
import numpy as np

In [15]:
seed = 311
rng = np.random.default_rng(seed) 

In [16]:
df = pd.DataFrame(rng.integers(0,100,size=(100, 4)), columns=['x1','x2','x3','x4'])
df

Unnamed: 0,x1,x2,x3,x4
0,34,15,23,56
1,8,64,4,70
2,46,13,6,79
3,60,98,43,8
4,2,18,33,8
...,...,...,...,...
95,0,68,99,27
96,33,59,21,33
97,7,79,49,84
98,6,57,0,43


In [17]:
correlations = rng.uniform(low=-1.0, high=1.0, size=4)
correlations

array([-0.16206972, -0.84142446,  0.76057477, -0.11313913])

In [18]:
# https://stackoverflow.com/questions/42902938/create-correlated-pandas-series

from scipy.stats import pearsonr
from scipy.optimize import minimize

# data = pd.DataFrame({'Country A': [10, 11, 10, 9]})

# data['Country B'] = minimize(lambda x: abs(0.8 - pearsonr(data['Country A'], x)[0]), 
#                              np.random.rand(len(data['Country A']))).x

df['y1'] = (minimize(lambda x: abs(correlations[0] - pearsonr(df['x1'], x)[0]),
                                  rng.random(len(df))).x) * 100

df['y2'] = (minimize(lambda x: abs(correlations[1] - pearsonr(df['x2'], x)[0]),
                                  rng.random(len(df))).x) * 100

df['y3'] = (minimize(lambda x: abs(correlations[2] - pearsonr(df['x3'], x)[0]),
                                  rng.random(len(df))).x) * 100

df['y4'] = (minimize(lambda x: abs(correlations[3] - pearsonr(df['x4'], x)[0]),
                                  rng.random(len(df))).x) * 100

display(df)

Unnamed: 0,x1,x2,x3,x4,y1,y2,y3,y4
0,34,15,23,56,30.932101,112.017972,17.880825,61.032801
1,8,64,4,70,79.810392,21.954097,-22.189555,90.086938
2,46,13,6,79,54.220147,87.188650,-11.496161,17.253994
3,60,98,43,8,83.510432,31.501714,41.675982,45.893792
4,2,18,33,8,93.021677,69.267574,20.690862,58.665601
...,...,...,...,...,...,...,...,...
95,0,68,99,27,45.898742,8.943823,107.231830,73.786603
96,33,59,21,33,6.484989,62.927989,27.221923,64.956593
97,7,79,49,84,88.273342,35.887342,75.935026,-2.698874
98,6,57,0,43,93.087366,7.637831,-33.838693,32.401276


In [19]:
df['y'] = np.round((df['y1'] + df['y2'] +df['y3'] + df['y4']), 0)
df = df.drop(columns=['y1', 'y2', 'y3', 'y4'])
df

Unnamed: 0,x1,x2,x3,x4,y
0,34,15,23,56,222.0
1,8,64,4,70,170.0
2,46,13,6,79,147.0
3,60,98,43,8,203.0
4,2,18,33,8,242.0
...,...,...,...,...,...
95,0,68,99,27,236.0
96,33,59,21,33,162.0
97,7,79,49,84,197.0
98,6,57,0,43,99.0


In [20]:
def linear_regression(df: pd.DataFrame, label: str):

    # Create coefficient matrix and dependent variable vector

    b = df[label].to_numpy()

    if df.iloc[:,0].sum() != len(df): # if there's no intercept column
        df.insert(loc = 0, column = 'x0', value = np.ones(len(df)))

    A = df.drop(label, axis=1).to_numpy()

    # Solve A_T_A x = A_T b for x

    A_T = np.transpose(A)

    A_T_A = np.matmul(A_T, A)

    A_T_b = np.matmul(A_T, b)

    x = np.linalg.solve(A_T_A, A_T_b)

    # Solve for p and compute sse

    p = np.matmul(A, x)

    e = b-p
    
    sse = (np.linalg.norm(e))**2.

    # Return coefficients for best solution and sse

    return x, sse

In [21]:
linear_regression(df,'y')

(array([ 2.09151056e+02, -3.95840962e-02, -1.31154935e+00,  9.43506590e-01,
         4.85673563e-02]),
 310934.41976584785)

In [10]:
correlations

array([-0.16206972, -0.84142446,  0.76057477, -0.11313913])

In [23]:
'''
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0...
>>> reg.predict(np.array([[3, 5]]))
array([16.])
'''
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(df[['x1','x2','x3','x4']].to_numpy(), df['y'].to_numpy())
print(reg.intercept_)
reg.coef_

209.1510558632482


array([-0.0395841 , -1.31154935,  0.94350659,  0.04856736])