# Moore's Law: Loading, Plotting, and Fitting Real Data
## Due: Friday, March 6, 2026
In 1965, engineer Gordon Moore predicted that the number of transistors on a microchip would double approximately every two years. This prediction — known as **Moore's Law** — held remarkably well for over five decades and drove much of the computing revolution.

In this notebook, you will:
1. Load historical transistor count data from a CSV file
2. Plot the raw data and identify why a standard linear scale is limiting
3. Use a log-scaled axis to check for exponential growth, then transform the data for fitting
4. Add Moore's theoretical prediction and fit your own model using `scipy.curve_fit`
5. Export your results to CSV and NumPy binary files
6. Filter the data and plot a histogram of chip introductions by year

---

## Part 1: Installing and Importing Libraries

This notebook uses three libraries that are not part of the Python standard library:

- **NumPy** (`numpy`): Efficient numerical arrays and math functions
- **Matplotlib** (`matplotlib`): Plotting and data visualization
- **SciPy** (`scipy`): Scientific computing tools, including curve fitting

### Installing with pip

`pip` is Python's built-in package manager. If you have Python installed, you likely already have `pip`. To check, open a terminal and run:

```
pip --version
```

If `pip` is not found, follow the installation instructions here: https://pip.pypa.io/en/stable/installation/

You can install packages directly from a Jupyter code cell using `!` to run a shell command. The cell below will install all three libraries. Run it once — you won't need to run it again unless you switch environments.

In [None]:
# Install required libraries — run this cell once, then you can skip it in future sessions
!pip install numpy matplotlib scipy

Now import the libraries. By convention, `numpy` is aliased as `np` and `matplotlib.pyplot` is aliased as `plt`. Import `curve_fit` directly from `scipy.optimize`.

In [None]:
# Import numpy, matplotlib.pyplot, and curve_fit from scipy.optimize


---

## Part 2: Loading the Data

The file `transistor_data.csv` contains historical data on microprocessors. Before writing any code, open the file in a text editor or spreadsheet application to examine its structure. You should find it in the same directory as this notebook.

The columns are:

| Processor | MOS transistor count | Date of Introduction | Designer | MOSprocess | Area |
|-----------|----------------------|----------------------|----------|------------|------|

You only need the **transistor count** (column index 1) and the **year** (column index 2).

Use `np.loadtxt` to load just those two columns. Key arguments:
- `delimiter`: the character separating columns
- `usecols`: a list of column indices to load
- `skiprows`: number of header rows to skip

See the [`np.loadtxt` documentation](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html) for details.

In [None]:
# Load transistor count (col 1) and year (col 2) from transistor_data.csv into an array called 'data'


Separate `data` into two 1D arrays: `transistor_count` and `year`. Print the first 10 values of each to verify the data loaded correctly.

In [None]:
# Separate data into transistor_count and year arrays; print the first 10 values of each


---

## Part 3: Plotting the Raw Data

Before doing any analysis, plot the raw data. Use `plt.figure()` to create a new figure, then `plt.scatter()` to plot `transistor_count` on the y-axis against `year` on the x-axis. Add axis labels with `plt.xlabel()` and `plt.ylabel()`, a title with `plt.title()`, and display the figure with `plt.show()`.

See the [plt.scatter() documentation here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) and the [MatPlotLib tutorial here](https://matplotlib.org/stable/tutorials/pyplot.html) for more details.

In [None]:
# Scatter plot of raw transistor_count vs year


You may notice two things about this plot:

1. The data has a strongly curved shape — a few modern chips dominate the y-axis while most of the historical data is squashed near zero. This is characteristic of **exponential growth**: the values span many orders of magnitude and a linear scale cannot show them all clearly.
2. The data looks somewhat **"stripy"** — multiple chips were often introduced in the same year, so many points share an x-coordinate. Keep this in mind for Part 9.

## Part 3b: Checking for Exponential Growth with a Log-Scaled Axis

If data grows exponentially, plotting it on a **logarithmic y-axis** should make it appear roughly linear — because taking the log of an exponential gives a straight line. You can apply a log scale to any Matplotlib plot with `plt.yscale('log')`, without transforming the data itself.

Reproduce your scatter plot from Part 3 and add `plt.yscale('log')` before `plt.show()`. Does the data look approximately linear on the log scale?

In [None]:
# Reproduce the scatter plot and add plt.yscale('log') to check for linearity


The data does appear roughly linear on a log scale, confirming that an exponential model is appropriate. In the next part you will work with the log-transformed data directly so you can fit a straight line to it.

---

## Part 4: Working in Log Space

Part 3b showed that the data is approximately linear on a log scale. Now make that explicit by computing the log of the data. If transistor count follows exponential growth:

$$\text{transistor\_count} = e^{A \cdot \text{year} + B}$$

then taking the natural log of both sides gives a **linear** relationship:

$$\ln(\text{transistor\_count}) = A \cdot \text{year} + B$$

Defining $y_i = \ln(\text{transistor\_count}_i)$ means we can fit a straight line to $y_i$ as a function of year, and the slope and intercept will describe the exponential growth rate.

Create a new array `yi` containing the natural log of `transistor_count`. Then create a scatter plot of `yi` vs `year`.

In [None]:
# Compute yi = ln(transistor_count); create a scatter plot of yi vs year


---

## Part 5: Moore's Theoretical Prediction

Gordon Moore's original prediction was that transistor count would **double every two years**, starting from approximately 2,250 transistors on the Intel 4004 chip in 1971.

Expressing Moore's Law as an exponential:

$$\text{transistor\_count} = e^{B_M} \cdot e^{A_M \cdot \text{year}}$$

The constants come from the doubling condition and the 1971 anchor point:

$$A_M = \frac{\ln(2)}{2} \approx 0.347 \qquad B_M = \ln(2250) - A_M \cdot 1971$$

In log space, this is simply the line:

$$\ln(\text{transistor\_count}) = A_M \cdot \text{year} + B_M$$

Define `A_M` and `B_M` using the formulas above. Then write a function `moores_law(year)` that returns the **log-space** prediction: $A_M \cdot \text{year} + B_M$.

In [None]:
# Define A_M and B_M; define moores_law(year) returning the log-space prediction


Reproduce your log-space scatter plot from Part 4 and add Moore's Law as a line. Use `plt.plot()` for the line and `plt.legend()` to add a legend.

In [None]:
# Scatter plot of yi vs year with Moore's Law overlaid as a line; include a legend


---

## Part 6: Curve Fitting with `scipy.curve_fit`

Moore's prediction was derived by hand from first principles. We can let the data speak for themselves by fitting a model directly. `scipy.curve_fit` finds the parameter values that minimize the sum of squared differences between your model and the observed data.

**How to use it:**

Provide:
1. A model function `f(x, param1, param2, ...)`
2. Your x-data
3. Your y-data

`curve_fit` returns:
- `popt`: the best-fit parameter values (in the same order as your function signature)
- `pcov`: the covariance matrix (a measure of uncertainty in each parameter)

Since we are working in log space, our model is the line $y_i = A \cdot \text{year} + B$.

Define a function `linear_model(year, A, B)` that returns $A \cdot \text{year} + B$. Then call `curve_fit` with `linear_model`, `year`, and `yi` as arguments. Extract the fitted `A` and `B` from `popt`.

See the [`scipy.curve_fit` documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) for details.

In [None]:
# Define linear_model(year, A, B); use curve_fit to fit it to (year, yi); extract A and B from popt


Print the fitted `A` and `B` alongside Moore's `A_M` and `B_M`. Then compute the **implied doubling time** from your fitted slope.

**Hint:** If count doubles every $T$ years, then $e^{A \cdot T} = 2$, which gives $T = \ln(2) / A$.

In [None]:
# Print A, B, A_M, B_M, and the implied doubling time from your fitted slope


---

## Part 7: Final Plot

Bring everything together in one plot showing:
1. The log-transformed data (`yi` vs `year`) as a scatter
2. Moore's Law as a line
3. Your fitted model as a line

Include axis labels, a title, and a legend.

In [None]:
# Final plot: scatter of yi vs year, Moore's Law, and your fitted model


---

## Part 8: Exporting Your Results

### Part 8a: Save to CSV with `np.savetxt`

Build a 2D array `output` with four columns: `year`, `transistor_count`, `yi`, and `moores_law(year)`. Each 1D array needs to be reshaped into a column vector using `[:, np.newaxis]` before combining them with `np.block`.

Save the result to `mooreslaw_results.csv` using `np.savetxt` with a descriptive `header` string and a comma delimiter.

In [None]:
# Build the output 2D array and save to mooreslaw_results.csv using np.savetxt


### Part 8b: Save to NumPy Binary with `np.save`

`np.save` stores a NumPy array in a compact binary `.npy` file. Unlike CSV, binary files are not human-readable, but they are more space-efficient and load back into Python quickly with no parsing overhead.

Save your `output` array to `mooreslaw_results.npy` using `np.save`.

In [None]:
# Save the output array to mooreslaw_results.npy using np.save


### Part 8c: Inspect Your Files

Before doing anything else, go to your file explorer and locate both `mooreslaw_results.csv` and `mooreslaw_results.npy` — they should be in the same directory as this notebook.

Try opening **each file** in a plain text editor (e.g. Notepad, TextEdit, gedit — not Excel or a spreadsheet app).

- What do you see when you open the `.csv`? Can you read the data?
- What do you see when you open the `.npy`? Can you read the data?
- Check the file size of each using the code below
  ```

Write a sentence or two below describing what you observed.

In [None]:
import os
csv_size = os.path.getsize("mooreslaw_results.csv")
npy_size = os.path.getsize("mooreslaw_results.npy")
print(f"mooreslaw_results.csv : {csv_size:,} bytes")
print(f"mooreslaw_results.npy : {npy_size:,} bytes")
print(f"The .csv file is {csv_size / npy_size:.1f}x larger than the .npy file")

*Your observations here.*

Now load both files back into Python to confirm the data saved correctly. Use `np.loadtxt` for the CSV (with `comments='#'` to skip the header) and `np.load` for the `.npy` file. Print the first 5 rows of each.

In [None]:
# Load mooreslaw_results.csv and mooreslaw_results.npy; print the first 5 rows of each


---

## Part 9: Indexing Arrays

So far you have worked with entire arrays at once. Often you need to access specific elements or subsets of an array. This part introduces two ways to do that.

### Part 9a: Positional Indexing

NumPy arrays are **zero-indexed**: the first element is at position `0`, the second at position `1`, and so on. You access individual elements using square brackets:

```python
year[0]    # first element
year[1]    # second element
year[-1]   # last element (negative indices count from the end)
year[:5]   # first 5 elements (a "slice")
```

The same index works on any array of the same length. So `year[0]` and `transistor_count[0]` give you the year and transistor count for the same chip.

Use positional indexing to print the name-equivalent information for the **first**, **last**, and **second to last** chips in the dataset. Since you only loaded the numeric columns, you'll be printing the year and transistor count. What are they?

In [None]:
# Print the year and transistor count for the first chip in the dataset
# Then print the year and transistor count for the last chip in the dataset


Interesting, the last entry appears to be an outlier (you can see it on your plots). 

Optional: This data was scraped from [this Wikipedia page](https://en.wikipedia.org/wiki/Transistor_count) (the second table). Look at the entry that corresponds to the last row of the data set. Can you find any information about that chip? Why does such a new chip have so few transistors?

### Part 9b: Boolean Indexing

You can also index an array with a **boolean array** — an array of `True` and `False` values the same length as the array you are indexing. Only the elements where the value is `True` are returned.

A boolean array is created by applying a comparison to an existing array:

```python
mask = year > 1990          # array of True/False: True wherever year > 1990
year[mask]                  # returns only the years greater than 1990
transistor_count[mask]      # the same mask applied to a different array
```

The mask and the array being indexed must have the same shape.

Create a boolean mask for entries where `year > 1980`. Apply it to both `year` and `transistor_count` to produce two filtered arrays. Print the first 5 values of each to verify.

In [None]:
# Create a boolean mask for year > 1980; apply it to year and transistor_count;
# print the first 5 values of each filtered array


---

## Part 10: Histogram of New Chips per Year

Use your filtered data to visualize how the number of new chip introductions has changed since 1980. Each row in the dataset represents one chip, so the number of rows with a given year equals the number of chips introduced that year — exactly what a histogram will count.

Since `year` contains only integer values, setting the number of histogram bins equal to the number of **unique years** in your filtered data will give you one bar per year. One option is to use `np.unique` to find the unique years, then pass `bins=len(unique_years)` to `plt.hist()`. But you can also mess around with the arguments of `plt.hist()` to get the same effect without explicitly counting unique years.

Create the histogram using `plt.hist()`, and appropriate axis labels and title.

In [None]:
# Histogram of chip introductions per year (post-1980 only)
# Set bins = number of unique years in the filtered data


---

## ADVANCED moves:
- The default font size and colors in Matplotlib are not great for presentations. Can you find a way to make your plots look nicer by increasing font sizes and choosing good colors? This can be as simple as updating some of the keywords in your plotting code, or you can explore Matplotlib's styling options to find a pre-made style you like. The [Matplotlib documentation on customizing plots](https://matplotlib.org/stable/tutorials/introductory/customizing.html) is a good place to start.
- What if you want to show your plot of Moore's law and the histogram of chip introductions side by side? Can you use Matplotlib's `plt.subplot()` to create a figure with two subplots, and put one plot in each?