# Instructions

1. Download the provided jupyter notebook file to your computer.
2. Write all your answers and code into this notebook file.
3. When your work is completed, export your notebook to an HTML file.
4. Submit your HTML file and a copy of the notebook to the assignment page on Moodle.



## Identification

### Your Information

Your Last Name:

Your First Name:


### Group Members (list any classmates you worked with on this problem set)

Your Group Members:

# Textbook Reading

This problem set will largely focus on readings as we build the theory behind unsupervised learning. These readings will be helpful for subsequent exercises.

The textbook reading for this week focuses on the theory and application of Principal Components Analysis (PCA) and dimension reduction. This topic can quickly come to rely on mathematics from the field of linear algebra. While these tools are useful for solving the problem, our focus will be on the interpretation of the results, so do not dwell on the underlying formulas.

1. ISLP - Chapter 6, Section 6.3 and 6.4. The main focus will be on section 6.3, but 6.4 has good examples of the problems that arise when working in high dimensions. 
2. GÃ©ron - Chapter 8. The most important section for our purposes is the PCA section pages 219-225. 

The optional readings for the week all use PCA to varying degrees. Feel free to read any that attract your interest. The piece by Vyas and Kumaranayake (2006) provides additional background explanation on PCA construction.

# Additional Materials on PCA

## StatQuest Video

There is a very accessible StatQuest video that goes through some of the formulas behind calculating principal components in lower dimensions. The insights presented here apply to higher dimensions and is worth watching. 

You can watch the video [here](https://www.youtube.com/watch?v=FgakZw6K1QQ).

## Article from Nature Methods

The provided article from Nature Methods (2017) also provides a useful 2-page summary of PCA. After reading this article, write 'Done' in the field below.

_Write 'Done' Here:_

# sklearn Pipeline Function

While presented in some prior code, a useful `sklearn` function for PCA and later analysis will be the `Pipeline`. Pipeline allows you to create a sequence of data instructions that can be applied to your data before fitting a model. It is particularly useful for ensuring your training and testing data both receive the same series of transformations. 

For example, when fitting a PCA model, we first need to standardize our data, then estimate and store the principal components ($Z_1, Z_2,\dots, Z_m$). If we subsequently wish to use these principal components to make estimates on testing data, we need to ensure the testing data undergoes the same scaling process and that the same principal component loading factors are applied to transform the columns of the testing data. Creating a pipeline is an efficient way to control the application of these steps.

Read the pipeline [user guide](https://scikit-learn.org/stable/modules/compose.html#pipeline) and examine the following two examples related to PCA:

1. [Selecting dimensionality reduction](https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py)
2. [Chaining a PCA and a logistic regression](https://scikit-learn.org/stable/auto_examples/compose/plot_digits_pipe.html#sphx-glr-auto-examples-compose-plot-digits-pipe-py)

While you do not need to understand every function used in these examples, pay attention to the creation of the pipeline object in the code beginning:

```python
pipe = Pipeline()
```

This code will contain the list of steps you wish to apply to the data, which can include both pre-processing transformations and model fitting.

After reviewing this documentation, write 'Done' in the field below.

_Write 'Done' Here:_

# Code Review From Problem Set 11

In Problem Set 11 I asked you to fit a variety of Bagging and Random Forest Models, both by hand and using built in functions. There were many possible ways to approach fitting these models, storing the model predictions and deciding on the hyperparameters to use (e.g. the maximum number of features available for the random forest). 

Working with another member(s) of the class, identify two problems you found challenging or where your approaches to the problem differed from that of your classmate. 

After this review please briefly describe one new approach or function you learned and when / how you may use this in the future.

Casey used 70 variables as random forest and bagging does not care about dummy variable trap
Anirush used list while I used dictionary, I think using list was the better appraoch
These used n_estimator equal tpo 300 while i did not specify, by deflaut it is 100
Difference in bias of random forest and bagging, on average they have the same, but variance varies.

# Readings on AI and Algorithmic Regulation

## Acemoglu and Johnson - Rebalancing AI

Read the provided piece by Acemoglu and Johnson. This piece draws on the authors' larger book _Power and Progress_ which examines historical waves of technological progress. A big theme of this work is understanding to what extent, if any, is the current rise of AI different from previous technological innovations, and what the historical impacts of these technological revolutions have been.

After reading the article, explain what the authors mean when they discuss the importance of 'new tasks' and how this shapes the potential long-run impact of automation from AI. The authors also differentiate between productive automation and 'so-so automation.' What is this distinction? Are there any sectors beyond those referenced in the piece that you think are particularly susceptible to automation that fails to create new tasks? 



_Your answer here:_

## Edwards and Veale - Enslaving the Algorithm

Read the provided piece by Edwards and Veale. 

After reading the piece, respond to the following [discussion board](https://moodle.lse.ac.uk/mod/hsuforum/view.php?id=1809728) on the Moodle course page which asks:

1. Does there exist a right to an explanation? (Thinking legally as well as morally)
2. How can a right to an explanation sufficiently protect individual interests? How may this right fall short? 

After responding to these questions, write 'Done' in the below field.

_Write 'Done' Here_: