# Moore's Law: Loading, Plotting, and Fitting Real Data

In 1965, engineer Gordon Moore predicted that the number of transistors on a microchip would double approximately every two years. This prediction — known as **Moore's Law** — held remarkably well for over five decades and drove much of the computing revolution.

In this notebook, you will:
1. Load historical transistor count data from a CSV file
2. Plot the data on a log-scale to visualize exponential growth
3. Add Moore's theoretical prediction to your plot
4. Fit your own exponential growth curve to the data using `scipy.curve_fit`
5. Compare your fitted model to Moore's original prediction
6. Export your results to CSV and NumPy binary files

---

## Part 1: Installing and Importing Libraries

This notebook uses three libraries that are not part of the Python standard library:

- **NumPy** (`numpy`): Efficient numerical arrays and math functions
- **Matplotlib** (`matplotlib`): Plotting and data visualization
- **SciPy** (`scipy`): Scientific computing tools, including curve fitting

### Installing with pip

`pip` is Python's built-in package manager. If you have Python installed, you likely already have `pip`. To check, open a terminal and run:

```
pip --version
```

If `pip` is not found, follow the installation instructions here: https://pip.pypa.io/en/stable/installation/

You can install packages directly from a Jupyter code cell using `!` to run a shell command. The cells below will install all three libraries. Run them once — you won't need to run them again unless you switch environments.

In [None]:
# Install required libraries — run this cell once, then you can skip it in future sessions
!pip install numpy matplotlib scipy

Now import the libraries. By convention, `numpy` is aliased as `np`, and `matplotlib.pyplot` is aliased as `plt`. Import `curve_fit` directly from `scipy.optimize`.

In [None]:
# Import numpy, matplotlib.pyplot, and curve_fit from scipy.optimize


---

## Part 2: Loading the Data

The file `transistor_data.csv` contains historical data on microprocessors. Open the file in a text editor or spreadsheet application and take a look at its structure before loading it into Python. You should be able to find it in the same directory as this notebook.

The file has the following columns:

| Processor | MOS transistor count | Date of Introduction | Designer | MOSprocess | Area |
|-----------|----------------------|----------------------|----------|------------|------|

You only need the **transistor count** (column index 1) and the **year** (column index 2).

Use `np.loadtxt` to load just those two columns. Key arguments you will need:
- `delimiter`: the character separating columns in the file
- `usecols`: a list of column indices to load
- `skiprows`: number of header rows to skip

Consult the [`np.loadtxt` documentation](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) if needed.

In [None]:
# Load transistor count (col 1) and year (col 2) from transistor_data.csv into an array called 'data'


Separate `data` into two 1D arrays: `transistor_count` and `year`. Then print the first 10 values of each to verify the data loaded correctly.

In [None]:
# Separate data into transistor_count and year arrays, then print the first 10 values of each


---

## Part 3: Working in Log Space

Transistor counts span many orders of magnitude — from thousands in the 1970s to tens of billions today. When data grows exponentially, it is useful to work with the **natural logarithm** of the data.

If transistor count grows exponentially, then:

$$\text{transistor\_count} = e^{A \cdot \text{year} + B}$$

Taking the natural log of both sides gives a **linear** relationship:

$$\ln(\text{transistor\_count}) = A \cdot \text{year} + B$$

This means that if we define $y_i = \ln(\text{transistor\_count}_i)$, we expect $y_i$ to be a linear function of year — and we can use linear curve fitting to find the constants $A$ and $B$.

Create a new array `yi` that contains the natural log of the transistor counts.

In [None]:
# Compute yi = natural log of transistor_count


---

## Part 4: Scatter Plot of the Data

Before fitting any model, plot the raw data to get a feel for it.

Create a scatter plot with `year` on the x-axis and `yi` (log transistor count) on the y-axis. Use `plt.subplots()` to create the figure and axes, and use the axes object (e.g., `ax`) to plot. Label your axes and give your plot a title.

In [None]:
# Create a scatter plot of yi vs year with labeled axes and a title


---

## Part 5: Moore's Theoretical Prediction

Gordon Moore's original prediction was that transistor count would **double every two years**, starting from approximately 2,250 transistors on the Intel 4004 chip in 1971.

We can express Moore's Law as:

$$\text{transistor\_count} = e^{B_M} \cdot e^{A_M \cdot \text{year}}$$

where the constants are derived from the doubling condition and the 1971 starting point:

$$A_M = \frac{\ln(2)}{2} \approx 0.347 \qquad B_M = \ln(2250) - A_M \cdot 1971$$

In log space, Moore's Law is simply:

$$\ln(\text{transistor\_count}) = A_M \cdot \text{year} + B_M$$

Define `A_M` and `B_M` using the formulas above, then write a function `moores_law(year)` that returns the **log** of the predicted transistor count for a given year (i.e., $A_M \cdot \text{year} + B_M$).

In [None]:
# Define A_M and B_M, then define moores_law(year) returning the log-space prediction


Add Moore's Law prediction to your scatter plot from Part 4. Plot `moores_law(year)` as a line over the scatter of `yi` vs `year`. Include a legend so the reader can distinguish data from the prediction.

In [None]:
# Reproduce your scatter plot and add Moore's Law as a line; include a legend


---

## Part 6: Curve Fitting with `scipy.curve_fit`

Moore's prediction was made by hand in 1965. We can do better by fitting a model directly to the data. `scipy.curve_fit` finds the values of parameters that make a given function best match observed data, by minimizing the sum of squared residuals.

**How it works:**
You provide:
1. A model function `f(x, param1, param2, ...)` that you want to fit
2. Your x-data
3. Your y-data

`curve_fit` returns:
- `popt`: the optimal parameter values
- `pcov`: the covariance matrix (related to the uncertainty in each parameter)

Since we are working in log space, our model is a straight line:

$$y_i = A \cdot \text{year} + B$$

Define a Python function `linear_model(year, A, B)` that implements this equation, then use `curve_fit` to fit it to `year` and `yi`. Extract the fitted parameters `A` and `B` from `popt`.

Consult the [`scipy.curve_fit` documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) if needed.

In [None]:
# Define linear_model(year, A, B), then use curve_fit to fit it to (year, yi)
# Extract fitted parameters A and B from popt


Print the fitted values of `A` and `B`, and compare them to Moore's constants `A_M` and `B_M`. Then compute the implied doubling time for your fitted model.

**Hint:** If transistor count doubles every $T$ years, then $e^{A \cdot T} = 2$, so $T = \ln(2) / A$.

In [None]:
# Print A, B, A_M, B_M, and the implied doubling time from your fitted model


---

## Part 7: Final Plot — Log Space

Now bring everything together. Create a plot in log space that shows:
1. The raw data (`yi` vs `year`) as a scatter
2. Moore's Law prediction as a line
3. Your fitted model as a line

Make sure to include a legend, axis labels, and a descriptive title.

In [None]:
# Create the final combined log-space plot with scatter data, Moore's Law, and your fitted model


## Part 8: Transforming to Original Units with `semilogy`

Log space is useful for fitting, but the original units (number of transistors) are easier to interpret. Matplotlib's `ax.semilogy` plots data on a **log-scaled y-axis** while keeping the x-axis linear — giving you the visual benefits of log space while displaying the actual transistor counts.

To use `semilogy`, you need to convert your model predictions back from log space to transistor counts:

$$\text{transistor\_count\_predicted} = e^{A \cdot \text{year} + B}$$

Compute `transistor_count_predicted` and `transistor_count_moores_law` from your model and Moore's model, respectively. Then reproduce your final plot using `ax.semilogy` instead of `ax.plot`/`ax.scatter`.

In [None]:
# Compute predictions in original units (transistor count), then plot using semilogy


---

## Part 9: Exporting Your Results

### 9a: Save to CSV with `np.savetxt`

Build a 2D array with four columns: `year`, `transistor_count`, `transistor_count_predicted`, and `transistor_count_moores_law`. You will need to reshape each 1D array to a column vector using `[:, np.newaxis]` before combining them with `np.block`.

Save the result to `mooreslaw_results.csv` using `np.savetxt`. Include a descriptive header string.

In [None]:
# Build output 2D array and save to mooreslaw_results.csv using np.savetxt


### 9b: Save to NumPy Binary with `np.save`

`np.save` saves a single NumPy array to a binary `.npy` file. Binary files are not human-readable, but they are much more space-efficient than text files and can be loaded back into Python exactly, with no parsing overhead.

Save your `output` array to `mooreslaw_results.npy` using `np.save`.

In [None]:
# Save the output array to mooreslaw_results.npy using np.save


### 9c: Inspect Your Files

Before loading the files back into Python, go find them in your file system and open each one:

- Open `mooreslaw_results.csv` in a text editor or spreadsheet application. Can you read it?
- Try to open `mooreslaw_results.npy` in a text editor. What do you see?
- Right-click each file and check its size (or use the terminal commands `ls -lh mooreslaw_results.csv` and `ls -lh mooreslaw_results.npy`).

Then, load both files back into Python to confirm the data was saved correctly. Use `np.loadtxt` for the CSV and `np.load` for the `.npy` file.

In [None]:
# Load mooreslaw_results.csv and mooreslaw_results.npy back into Python and print the first 5 rows of each


---

## Part 10: Reflection Questions

Answer the following questions in the markdown cells below. Your responses should be written in your own words — do not use an LLM to write these.

**1.** After inspecting both exported files, describe the differences you observed between the `.csv` and `.npy` formats. What are the trade-offs between the two? In what situations would you prefer one over the other?


*Your answer here.*

**2.** Your fitted doubling time is close to, but not exactly, 2 years. Does this mean Moore's Law is wrong, or that the data doesn't follow an exponential trend? Explain what your fitted model is actually telling you, and why there is a difference between the fitted slope and Moore's predicted slope.

*Your answer here.*

**3.** Describe how you used any AI tools during this assignment. What did you use them for? Were there moments where the AI-generated code didn't work or wasn't quite right? How did you handle that?

*Your answer here.*