<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Class_05_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 5: Regularization and Dropout**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 5 Material

* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso
* Part 5.2: Using K-Fold Cross Validation with Keras
* Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting
* **Part 5.4: Drop Out for Keras to Decrease Overfitting**
* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques



## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

## Define functions

The cell below creates the function(s) needed for this lesson. You you don't run this cell, the code in some of the examples and exercises will **not** run.

In [3]:
# Simple function to print out elasped time
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# Simple function to change column name in a dataframe
def rename_col_by_index(dataframe, index_mapping):
    dataframe.columns = [index_mapping.get(i, col) for i, col in enumerate(dataframe.columns)]
    return dataframe

# **Drop Out for Keras to Decrease Overfitting**

Hinton, Srivastava, Krizhevsky, Sutskever, & Salakhutdinov (2012) introduced the **_dropout regularization_** algorithm. [[Cite:srivastava2014dropout]](http://www.jmlr.org/papers/volume15/nandan14a/nandan14a.pdf)

Although dropout works differently than L1 and L2, it accomplishes the same goal—the prevention of overfitting. However, the algorithm does the task by actually _removing_ neurons and connections—at least temporarily. Unlike L1 and L2, no weight penalty is added. Dropout does not directly seek to train small weights.

Dropout works by causing hidden neurons of the neural network to be unavailable during part of the training. Dropping part of the neural network causes the remaining portion to be trained to still achieve a good score even without the dropped neurons. This technique decreases co-adaptation between neurons, which results in less overfitting.

Most neural network frameworks implement dropout as a separate layer. Dropout layers function like a regular, densely connected neural network layer. The only difference is that the dropout layers will periodically drop some of their neurons during training. You can use dropout layers on regular feedforward neural networks.

A _program_ can implement a dropout layer as a dense layer that can eliminate some of its neurons. Contrary to popular belief about the dropout layer, such a program does not permanently remove these discarded neurons. In other words, a dropout layer does **_not_** lose any of its neurons during the training process, and it will still have the same number of neurons after training. In this way, the program only _temporarily masks_ the neurons rather than dropping them.

Figure 5.DROPOUT shows how a dropout layer might be situated with other layers.

**Figure 5.DROPOUT: Dropout Regularization**
![Dropout Regularization](https://biologicslab.co/BIO1173/images/class_9_dropout.png "Dropout Regularization")

The discarded neurons and their connections are shown as dashed lines. The input layer has two input neurons as well as a bias neuron. The second layer is a dense layer with three neurons and a bias neuron. The third layer is a dropout layer with six regular neurons even though the program has dropped 50% of them.

While the program drops these neurons, it neither calculates nor trains them. However, the final neural network will use _all_ of these neurons for the output. As previously mentioned, the program only temporarily discards the neurons.

The program chooses different sets of neurons from the dropout layer during subsequent training iterations. Although we chose a probability of 50% for dropout, the computer will not necessarily drop three neurons. It is as if we flipped a coin for each of the dropout candidate neurons to choose if that neuron was dropped out. You must know that the program should never drop the bias neuron. Only the regular neurons on a dropout layer are candidates.

The implementation of the training algorithm influences the process of discarding neurons. The dropout set frequently changes once per training iteration or batch. The program can also provide intervals where all neurons are present. Some neural network frameworks give additional hyper-parameters to allow you to specify exactly the rate of this interval.

## **Why does Dropout work?**

Why dropout is capable of decreasing overfitting is a common question. The answer is that dropout can reduce the chance of **_codependency_** developing between two neurons. Two neurons that develop codependency will not be able to operate effectively when one is dropped out. As a result, the neural network can no longer rely on the presence of every neuron, and it trains accordingly. This characteristic decreases its ability to memorize the information presented, thereby forcing generalization.

Dropout also decreases overfitting by **_forcing a bootstrapping process_** upon the neural network. Bootstrapping is a prevalent ensemble technique. Ensembling is a technique of machine learning that combines multiple models to produce a better result than those achieved by individual models. The ensemble is a term that originates from the musical ensembles in which the final music product that the audience hears is the combination of many instruments.  

**_Bootstrapping_** is one of the most simple ensemble techniques. The bootstrapping programmer simply trains several neural networks to perform precisely the same task. However, each neural network will perform differently because of some training techniques and the random numbers used in the neural network weight initialization. The difference in weights causes the performance variance. The output from this ensemble of neural networks becomes the average output of the members taken together. This process decreases overfitting through the consensus of differently trained neural networks.  

Dropout works somewhat like bootstrapping. You might think of each neural network that results from a different set of neurons being dropped out as an individual member in an ensemble. As training progresses, the program creates more neural networks in this way. However, dropout does not require the same amount of processing as bootstrapping. The new neural networks created are temporary; they exist only for a training iteration. The final result is also a single neural network rather than an ensemble of neural networks to be averaged together.

This short YouTube video shows how dropout works: [Dropout tutorial](https://youtu.be/NhZVe50QwPM?si=Zr-6qrPdE9YXTj3Q)


## Example 1: Dropout for Keras

For Example 1 we will create a neural network with 3 hidden layers and 2 dropout layers to demonstrate how to implement dropout layers in classification neural network. The dataset for Example 1 is the [Body Performance Dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) that we have used in previous lessons (e.g. Class_05_2, Class_05_3).

This dataset has the following 12 categories:

* **age:** 20 ~64
* **gender:** M,F
* **height_cm:** (If you want to convert to feet, divide by 30.48)
* **weight_kg:**
* **body fat_%:**
* **diastolic:** diastolic blood pressure (min)
* **systolic:** systolic blood pressure (min)
* **gripForce:**
* **sit and bend forward_cm:**
* **sit-ups counts:**
* **broad jump_cm:**
* **class:** A,B,C,D ( A: best) / stratified

To help follow the code examples, Example 1 has been divided into 3 steps.

### Example 1 - Step 1: Create feature vector

The code in the cell below reads the Body Performance dataset, `bodyPerformance.csv` from the course HTTPS server and creates a new DataFrame called `bpBigDF`. To speedup training, only 30% of `bpBigDF` we used to create the DataFrame for this example, `bpDF`.

The column `gender` is mapped (`M`=`0`,`F`=`1`). While only the columns `age`, `height_cm`, `weight_kg`, `diastolic`, `systolic` and `gripForce` are standardized, all of the columns, except `class` are used for creating the x-value variable. Since we will are building a classification neural network, column `classes` is One-Hot encoded to gererate the y-values.

As always, both the x-values and the y-values must be converted to type `float32` to avoid errors during training.

In [None]:
# Example 1 - Step 1: Create feature vector

import pandas as pd
import numpy as np
import scipy.stats
from scipy.stats import zscore

# Read the data set
bpBigDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/bodyPerformance.csv",
    na_values=['NA','?'])

# Only use 30% for neural network
bpDF=bpBigDF.sample(frac=0.30, random_state=2)

# Map Gender
mapping =  {'M': 0,
            'F': 1}
bpDF['gender'] = bpDF['gender'].map(mapping)

# Standardize ranges
bpDF['age'] = zscore(bpDF['age'])
bpDF['height_cm'] = zscore(bpDF['height_cm'])
bpDF['weight_kg'] = zscore(bpDF['weight_kg'])
bpDF['diastolic'] = zscore(bpDF['diastolic'])
bpDF['systolic'] = zscore(bpDF['systolic'])
bpDF['gripForce'] = zscore(bpDF['gripForce'])

# Generate list of columns for x
bpX_columns = bpDF.columns.drop('class')  # `class` is y-value

# Generate x-values as numpy array
bpX = bpDF[bpX_columns].values

# Convert x-values to float 32
bpX = np.asarray(bpX).astype('float32')

# One-Hot encode column containing y-values
dummies = pd.get_dummies(bpDF['class']) # Classification
bpCategories = dummies.columns
bpY = dummies.values

# Convert y-values to float 32
bpY = np.asarray(bpY).astype('float32')

# Print y categorical names
print(*bpCategories)

If your code is correct you should see the 4 different fitness levels in your output:
~~~text
A B C D
~~~

### Example 1 - Step 2: Keras with dropout for Classification

The code in the cell below creates a sequential neural network with 3 hidden layers of densely connected neurons. A dropout layer is added after the first and second hidden layers, but _not_ after the last hidden (3rd) layer. A dropout layer is not usually added after the last hidden layer.

Since this neural network is designed for classification, the output layer has 4 neurons -- one neuron for each fitness class. The number of output neurons is set by the argument `bpY.shape[1]` as demonstrated in the following code chunk:
~~~text
model.add(Dense(bpY.shape[1],activation='softmax')) # Output layer
~~~
The code is setup to take advantage of **_K_-fold Cross Validation_**. The advantages of using this technique and how to implement it were demonstrated previously, in Class_05_2. The number of _K_-folds to be employed is set by the variable `numK=5`. During each _K_ turn of the `for loop`, a new neural network is created and trained for the number of epochs specified in the variable `EPOCHS=100`.  

In [None]:
# Example 1 - Step 2: Keras with dropout for Classification

import pandas as pd
import numpy as np
import time

import sklearn
from sklearn import metrics
from sklearn.model_selection import KFold

import keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.layers import Dropout
from keras import regularizers

# Set variables
EPOCHS = 100  # number of epochs for each loop
numK = 5     # number of K-folds

# Record start time
start_time = time.time()

# Cross-validate
kf = KFold(numK, shuffle=True, random_state=42)

# Initialize arrays and counter
oos_y = []
oos_pred = []
fold = 0

def create_and_compile_model(input_dim, output_dim):
    model = Sequential()
    model.add(Input(shape=(input_dim,)))  # Input
    model.add(Dense(50, activation='relu'))  # Hidden 1
    model.add(Dropout(0.5))  # Add dropout after Hidden 1
    model.add(Dense(25, activation='relu', activity_regularizer=regularizers.l1(1e-4)))  # Hidden 2
    model.add(Dropout(0.5))  # Add dropout after Hidden 2
    model.add(Dense(10, activation='relu', activity_regularizer=regularizers.l1(1e-4)))  # Hidden 3
    model.add(Dense(output_dim, activation='softmax'))  # Output layer
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model

print("STAND BY: TRAINING IS STARTING")
# Start Loop ----------------------------------------------------------------#
for train, test in kf.split(bpX):
    fold += 1
    print(f"Starting Fold #{fold}...")

    # Split data for this fold
    x_train = bpX[train]
    y_train = bpY[train]
    x_test = bpX[test]
    y_test = bpY[test]

    # Create and compile the model for this K fold
    model = create_and_compile_model(bpX.shape[1], bpY.shape[1])

    # Fit model
    model.fit(x_train, y_train, validation_data=(x_test, y_test), verbose=0, epochs=EPOCHS)

    # Use model to make predictions
    pred = model.predict(x_test)

    # Add actual y-values for the data used this fold
    oos_y.append(y_test)

    # Raw probabilities to chosen class (highest probability)
    pred = np.argmax(pred, axis=1)
    oos_pred.append(pred)

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test, axis=1)  # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score:.3f}")

# End Loop ----------------------------------------------------------------#

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y, axis=1)  # For accuracy calculation
score = metrics.accuracy_score(oos_y_compare, oos_pred)
print(f"Final score (accuracy): {score:.3f}")

# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat([oos_pred, oos_y], axis=1)

# Print elapsed time
elapsed_time = time.time() - start_time
print("Elapsed time: {}".format(hms_string(elapsed_time)))


If the code is correct, you should see something similiar to the following output:

~~~text
STAND BY: TRAINING IS STARTING
Starting Fold #1...
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Fold score (accuracy): 0.578
Starting Fold #2...
26/26 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step
Fold score (accuracy): 0.531
Starting Fold #3...
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.540
Starting Fold #4...
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Fold score (accuracy): 0.562
Starting Fold #5...
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
Fold score (accuracy): 0.549
Final score (accuracy): 0.552
Elapsed time: 0:03:17.59
~~~

### Example 1 - Step 3: Print out actual and predicted y-values

The code in the cell below prints out the predicted and the actual Body Performance classes for the out-of-sample (oos) individuals. The function `new_column_mapping()` created at the start of this lesson is used make the column labels more informative.

In [None]:
# Example 1 - Step 3: Print out actual and predicted y-values

# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

If your code is correct, you should see something similar to the following table:

![___](https://biologicslab.co/BIO1173/images/class_05_4_Exm1C.png)

For the 8 predictions shown in the above table, 4 were correct and 4 were incorrect. That error rate is what you might expect with a final accuracy score of around 60%.

## **Exercise 1: Dropout for Keras**

For **Exercise 1** you are to create a classification neural network with 3 hidden layers and 2 dropout layers. The dataset you should use is the [Obesity Prediction Dataset](https://www.kaggle.com/datasets/mrsimple07/obesity-prediction) that we have seen previously in lessons Class_05_2 and  Class_05_3.

The 7 categories of obesity/demographics measurements are:

* **Age:** The age of the individual, expressed in years (Mean=49.9 yrs +/- 18.1)
* **Gender:** The gender of the individual coded Male and Female
* **Height:** The height of the individual measured in centimeters (Mean=170 cm +/- 10.3)
* **Weight:** The weight of the individual measured in kilograms (Mean=71.2 kg +/- 15.5)
* **BMI:** Body mass index, a calculated metric derived from the individual's weight and height (Mean=24.9 +/- 6.19)
* **PhysicalActivityLevel:** This variable quantifies the individual's level of physical activity (Mean=2.53 +/- 1.12)
* **ObesityCategory:** Categorization of individuals based on their BMI into different obesity categories. The Obesity Categories are: `Underweight`, `Normal weight`, `Overweight`, and `Obese`.

You are to construct a neural network that can predict the correct `ObesityCategory`.

To help you in your coding, **Exercise 1** has been divided into 3 sections, A`, `B` and `C`.

### **Exercise 1 - Step 1: Create feature vector**

In the cell below, write the code to read the Obesity Prediction dataset `obesity_prediction.csv` and create a new DataFrame called `opDF`. Since this dataset isn't too large, use the **entire** dataset to create `opDF` (i.e. don't sample it).

You will need to map the column `Gender` to convert the strings `Male` and `Female` to integers. Also standardize the columns `Age`, `Height`, `Weight` and `BMI` to their Zscores.

Since the column, `ObesityCategory`, will be the y-value for training your neural network, make sure to drop it when generating your list of column names for generatung x-values (`opX_columns`). You can use the following code chunk to create your x-values:
~~~text
# Generate x-values as numpy array
opX = opDF[opX_columns].values
~~~
Since you are building a neural network for classification, you will need to One-Hot encode the `ObesityCategory` column. Use the `dummies.values`, created as part of your One-Hot encoding, as your y-value, `opY`. You should also save the `dummies.columns` in a variable called `obCategories`.

Don't forget to convert all your x-values and y-values to `float32` or you will get an error message when you try to train your neural network.

Finally, print out the categorical values (names) that were One-Hot encoded using the "starred" print statement:
~~~text
# Print y categorical names
print(*obCategories)
~~~

In [None]:
# Insert your code for Exercise 1 - Step 1 here



If your code is correct, you should see the names of the 5 obesity categories:
~~~text
Normal weight Obese Overweight Underweight
~~~

### **Exercise 1 - Step 2: Keras with dropout for Classification**

In the cell below, create a sequential neural network with 3 hidden layers of densely connected neurons. Add a dropout layer after the first and second hidden layers, but not after the 3rd hidden layer.

You can reuse the code for Example 1 - Step 2 after you change the name of you X- and Y-values.

In [None]:
# Insert your code for Exercise 1 - Step 2 here



Training of your neural network should run faster that the one in Example 1.

If your code is correct, you should see the following output:
~~~text
STAND BY: TRAINING IS STARTING
Starting Fold #1...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Fold score (accuracy): 0.940
Starting Fold #2...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step
Fold score (accuracy): 0.970
Starting Fold #3...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
Fold score (accuracy): 0.935
Starting Fold #4...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Fold score (accuracy): 0.955
Starting Fold #5...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Fold score (accuracy): 0.940
Final score (accuracy): 0.948
Elapsed time: 0:02:35.02
~~~

The final accuracy score for your neural network is about 95%, which is very good. If your accuracy score is much lower, it probably means that you have an error in **Exercise 1 - Step 1**.

### **Exercise 1 -Step 3: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Obesity Catagories for the out-of-sample (oos) individuals.

Use the following code chunk to change the column names to make them easier to interpret.
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Obesity Category', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~

In [None]:
# Insert your code for Exercise 1 - Step 3 here



If your code is correct, you should see something similar to the following table:

![___](https://biologicslab.co/BIO1173/images/class_05_4_Exe1C.png)

All 8 predictions shown in the above table are correct. Almost perfect predictions is what you might expect with a final accuracy score of around 95%.

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your CoLab notebook. Save your PDF as `Class_05_4.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Poly-A Tail**

## **VisiCalc**

![__](https://upload.wikimedia.org/wikipedia/commons/7/7a/Visicalc.png)

**VisiCalc** ("visible calculator") is the first spreadsheet computer program for personal computers, originally released for the Apple II by VisiCorp on October 17, 1979. It is considered the killer application for the Apple II, turning the microcomputer from a hobby for computer enthusiasts into a serious business tool, and then prompting IBM to introduce the IBM PC two years later.More than 700,000 copies were sold in six years, and up to 1 million copies over its history.

Initially developed for the Apple II computer using a 6502 assembler running on the Multics time-sharing system, VisiCalc was ported to numerous platforms, both 8-bit and some of the early 16-bit systems. To do this, the company developed porting platforms that produced bug compatible versions. The company took the same approach when the IBM PC was launched, producing a product that was essentially identical to the original 8-bit Apple II version. Sales were initially brisk, with about 300,000 copies sold.

When Lotus 1-2-3 was launched in 1983, taking full advantage of the expanded memory and screen of the IBM PC, VisiCalc sales declined so rapidly that the company was soon insolvent. In 1985, Lotus Development purchased the company and ended sales of VisiCalc.

**History**

> VISICALC represented a new idea of a way to use a computer and a new way of thinking about the world. Where conventional programming was thought of as a sequence of steps, this new thing was no longer sequential in effect: When you made a change in one place, all other things changed instantly and automatically.
 — Ted Nelson

Dan Bricklin conceived of VisiCalc while watching a presentation at Harvard Business School. The professor was creating a financial model on a blackboard that was ruled with vertical and horizontal lines (resembling accounting paper) to create a table, and he wrote formulas and data into the cells. When the professor found an error or wanted to change a parameter, he had to erase and rewrite several sequential entries in the table. Bricklin realized that he could replicate the process on a computer using an "electronic spreadsheet" to view results of underlying formulae.

Bob Frankston joined Bricklin at 231 Broadway, Arlington, Massachusetts, and the pair formed the Software Arts company, and developed the VisiCalc program in two months during the winter of 1978–79. Bricklin wrote:

> with the years of experience we had at the time we created VisiCalc, we were familiar with many row/column financial programs. In fact, Bob had worked since the 1960s at Interactive Data Corporation, a major timesharing utility that was used for some of them and I was exposed to some at Harvard Business School in one of the classes.

Bricklin was referring to the variety of report generators that were in use at that time, including Business Planning Language (BPL) from International Timesharing Corporation (ITS) and Foresight from Foresight Systems. However, these earlier timesharing programs were not completely interactive, and they pre-dated personal computers.

Frankston described VisiCalc as a "magic sheet of paper that can perform calculations and recalculations [which] allows the user to just solve the problem using familiar tools and concepts". The Personal Software company began selling VisiCalc in mid-1979 for under US\$100 (equivalent to \$420 in 2023), after a demonstration at the fourth West Coast Computer Faire and an official launch on June 4 at the National Computer Conference. It requires an Apple II with 32K of random-access memory (RAM), and supports saving files to magnetic tape cassette or to the Apple Disk II floppy disk system.

VisiCalc was unusually easy to use and came with excellent documentation. Apple's developer documentation cited the software as an example of one with a simple user interface. Observers immediately noticed its power. Ben Rosen speculated in July 1979, that "VisiCalc could someday become the software tail that wags (and sells) the personal computer dog". For the first 12 months, it was only available for Apple II, and became its killer app. John Markoff wrote that the computer was sold as a "VisiCalc accessory", and many bought \$2,000 (equivalent to \$8,400 in 2023) Apple computers to run the \$100 software — more than 25% of those sold in 1979 were reportedly for VisiCalc — even if they already owned other computers. Steve Wozniak said that small businesses, not the hobbyists he and Steve Jobs had expected, purchased 90% of Apple IIs. Apple's rival Tandy Corporation used VisiCalc on Apple IIs at their headquarters. Other software supports its Data Interchange Format (DIF) to share data. One example is the Microsoft BASIC interpreter supplied with most microcomputers that ran VisiCalc. This allowed skilled BASIC programmers to write features, such as trigonometric functions, that VisiCalc lacked.

Bricklin and Frankston originally intended to fit the program into 16k memory, but they later realized that the program needed at least 32k. Even 32k is too small to support some features that the creators wanted to include, such as a split screen for text and graphics. However, Apple eventually began shipping all Apple IIs with 48k memory following a drop in RAM prices, enabling the developers to include more features. The initial release supported tape cassette storage, but that was quickly dropped.

At VisiCalc's release, Personal Software promised to port the program to other computers, starting with those with the MOS Technology 6502 microprocessor,[17] and versions appeared for Atari 8-bit computers and Commodore PET. Both of those were easy, because those computers have the same CPU as Apple II, and large portions of code were reused. The PET version, which contains two separate executables for 40 and 80-column models, was widely criticized for having a very small amount of worksheet space due to the developers' inclusion of their own custom DOS, which uses a large amount of memory. The PET only has 32k versus Apple II's available 48k.

Other ports followed for Apple III, the Zilog Z80-based Tandy TRS-80 Model I, Model II, Model III, Model 4, and Sony SMC-70. The TRS-80 Model I and Sony SMC-70 ports are the only versions of VisiCalc without copy protection. The HP 125 and Sony SMC-70 ports are the only CP/M version. Most versions are disk-based, but the PET VisiCalc came with a ROM chip that the user must install in one of the motherboard's expansion ROM sockets. The most important port is for the IBM PC, and VisiCalc became one of the first commercial packages available when the IBM PC shipped in 1981. It quickly became a best-seller on this platform, though severely limited to be compatible with the versions for the 8-bit platforms. It is estimated that 300,000 copies were sold on the PC, bringing total sales to about 1 million copies.

By 1982, VisiCalc's price had risen from \$100 to \$250 (equivalent to \$790 in 2023). Several competitors appeared in the market, such as SuperCalc and Multiplan, each of which have more features and corrected deficiencies in VisiCalc, but could not overcome its market dominance. A more dramatic change occurred with the 1983 launch of Lotus Development Corporation's Lotus 1-2-3, created by former Personal Software/VisiCorp employee Mitch Kapor, who had written VisiTrend and VisiPlot. Unlike the IBM PC version of VisiCalc, 1-2-3 was written to take full advantage of the PC's increased memory, screen, and performance. Yet it was designed to be as compatible as possible with VisiCalc, including the menu structure, to allow VisiCalc users to easily migrate to 1-2-3.

1-2-3 was almost immediately successful, and in 1984, InfoWorld wrote that sales of VisiCalc were "rapidly declining", stating, that it was "the first successful software product to have gone through a complete life cycle, from conception in 1978 to introduction in 1979 to peak success in 1982 to decline in 1983 to a probable death according to industry insiders in 1984". The magazine added that the company was slow to upgrade the software, only releasing an Advanced Version of VisiCalc for Apple II in 1983, and announcing one for the IBM PC in 1984.[30] By 1985, VisiCorp was insolvent. Lotus Development acquired Software Arts, and ended sales of the application.