<a href="https://colab.research.google.com/github/wel51x/DS-Unit-4-Sprint-4-Deep-Learning/blob/master/My_LS_DS_444_AGI_and_The_Future_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lambda School Data Science - Artificial General Intelligence and The Future

![Future City](https://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/City-of-the-future.jpg/640px-City-of-the-future.jpg)

# Lecture

## Defining Intelligence

A straightforward definition of Artificial Intelligence would simply be "intelligence, created from technology rather than biology." But that simply raises the question - what is *intelligence*?

In the early history of computers, this seemed like an easier question. Intelligence meant solving tricky problems - things that took time and mental effort for a human to figure out.

Defined that way, computers have made a litany of intelligent achievements over the years:
- Arithmetic
- Logic
- Chess
- Go
- StarCraft
- Mathematical proofs
- Understanding natural language
- Generating natural language
- Understanding images
- Generating images
- Making medical diagnoses
- Fitting and *optimizing* ML models

And many more - every time you fit a simple regression, you're facilitating an act of artificial intelligence. You're writing code that will (hopefully) understand and generalize based on data, giving a "human-like" ability to intuit and predict something.

## "General" Intelligence - a moving target

But, somehow, that isn't what most people *really* mean when they talk about AI.

![And they both react poorly to showers.](https://imgs.xkcd.com/comics/ai.png)

Somewhere that word "general" snuck in, and now we're concerned about "Artificial General Intelligence." So, what is that?

![Data](https://upload.wikimedia.org/wikipedia/en/0/09/DataTNG.jpg)

The inspiration is likely characters such as the above, but that's not a definition. Intuitively the claim is "computers that can be thrown in a variety of environments and learn without guidance", but another good definition (based on how people use the term) may simply be "whatever we haven't figured out how to get computers to do yet."

Repeatedly, claims are made about tasks that will require a "true AI" to achieve. Then, when those tasks are completed, the bar is moved, and "true AI" is somehow always a bit further off.

## AI - Hype versus Value

Hot off the presses! [Google launches an end-to-end AI platform](https://techcrunch.com/2019/04/10/google-expands-its-ai-services/)!

...

What does that mean? Well, it might mean a lot, but it's a little unclear what. Some selected [Hacker News](https://news.ycombinator.com/item?id=19626275) comments:

> This platform focuses not on the this-AI-is-magic-and-can-solve-everything like many AI SaaS startups announced on Hacker News, but focuses on how to actually integrate this AI into production workflows, which is something I wish was discussed more often in AI. -- minimaxir

> Looks like Google is taking over Cloud (from AWS) for AI by building an ecosystem and building tools for non Data scientists - consumer level product. Surely IBM can do similar thing with their recent Redhat acquisition, but will they ? -- amrrs

> I work in building and deploying production ML/AI models but I'm having a lot of trouble cutting through the marketing jargon in this article and on Google's website as well. Can someone explain what this does in engineering terms? How does this differ from something like AWS Sagemaker? -- chibg10

> This will make a bunch of startup's life really hard. I think it makes it harder to justify investing in your own ML pipeline or even building your own models for many use cases. -- petard

One thing it definitely means - AI is a hot keyword, and people making hiring and other corporate decisions will be on the look out for it, even if they're not sure what it is.

So - yes, you *do* know AI. AI is a real thing, and you are capable of using "artificial" technology to bring about real *intelligence* and insight.

Do you know how to make an intelligent anthropomorphic android? No - and nobody else does yet, either. And that's OK. There's still lots of cool advances and things to learn and build.

## Automation, for good and ill

It is worth spending a moment considering the double-edged sword that is automation. This story did not begin with artificial intelligence, or even statistics or mathematics - it began when the first tool inventor figured out how to make something clever like a lever or a wheel, and use it to reduce the amount of labor needed to achieve some task.

In the modern day we talk about automation, but in practice most technology is best considered as a *productivity multiplier* - all businesses still need at least *some* humans around, if nothing else to make policy decisions and collect profit. But the productivity of each individual person can be greatly enhanced through the use of technology.

Consider farming - formerly a signification source of employment (and also small family owned farms), technology has tranformed it into a large scale industry where a handful of people produce as much as many more did before. This progression has happened in many areas - fortunately, it is usually accompanied by job growth and opportunity as new markets and services are created by technology as well.

So, is it different now? Maybe - "history will say" is the only safe stance. But we are automating work at an accelerating rate, and it's unclear where all this growth is going and where the opportunities will be. There's a pretty good bet that it'll involve computers and data - and that's probably a large part of why you're here!

The purpose of this section is not to convince you of anything - it is just to make you think. As a Data Scientist, you will have an outsized impact on society, and it is your responsibility to consider that impact and what you want to do with it.

**Important caveat** - think and engage with society, *but* strive to not be strident or unduly certain when you do so. Broadcasting political beliefs, especially while on the job market, usually closes more doors than it opens. So, consider perspectives, and encourage dialogue - don't just (re)broadcast outrage at the latest injustice.

In [0]:
!pip install tpot



In [0]:
import pandas as pd
from tpot import TPOTRegressor

df = pd.read_csv('car_regression.csv')
df.head()

Unnamed: 0,make,price,body,mileage,engV,engType,registration,year,drive
0,23,15500.0,0,68,2.5,1,1,2010,1
1,50,20500.0,3,173,1.8,1,1,2011,2
2,50,35000.0,2,135,5.5,3,1,2008,2
3,50,17800.0,5,162,1.8,0,1,2012,0
4,55,16600.0,0,83,2.0,3,1,2013,1


In [0]:
df.describe()

Unnamed: 0,make,price,body,mileage,engV,engType,registration,year,drive
count,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0
mean,46.535491,16185.453305,2.302295,141.744202,2.568337,1.650618,0.941613,2006.500883,0.575868
std,24.526251,24449.641512,1.610307,97.464062,5.387238,1.341282,0.234488,6.925907,0.741235
min,0.0,259.35,0.0,0.0,0.1,0.0,0.0,1959.0,0.0
25%,23.0,5490.0,1.0,74.0,1.6,0.0,1.0,2004.0,0.0
50%,50.0,9500.0,3.0,130.0,2.0,1.0,1.0,2008.0,0.0
75%,68.0,17145.6,3.0,197.0,2.5,3.0,1.0,2011.0,1.0
max,82.0,547800.0,5.0,999.0,99.99,3.0,1.0,2016.0,2.0


In [0]:
from sklearn.model_selection import train_test_split

X = df.drop('price', axis=1).values
X_train, X_test, y_train, y_test = train_test_split(
    X, df['price'].values, train_size=0.75, test_size=0.25)

In [0]:
%%time

tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

HBox(children=(IntProgress(value=0, description='Optimization Progress', max=120, style=ProgressStyle(descript…

Generation 1 - Current best internal CV score: -133141429.03390601
Generation 2 - Current best internal CV score: -129432295.79929402
Generation 3 - Current best internal CV score: -129432295.79929402
Generation 4 - Current best internal CV score: -129432295.79929402
Generation 5 - Current best internal CV score: -127465051.18964705

Best pipeline: GradientBoostingRegressor(input_matrix, alpha=0.99, learning_rate=0.1, loss=lad, max_depth=9, max_features=0.25, min_samples_leaf=10, min_samples_split=15, n_estimators=100, subsample=0.9000000000000001)
-120411233.98885702
CPU times: user 7min 16s, sys: 9.28 s, total: 7min 25s
Wall time: 7min 15s


In [0]:
tpot.predict(X_test)

array([ 6941.76312027,  6490.98506205,  5716.99816711, ...,
        9482.15768688,  8170.80266469, 22037.23983594])

In [0]:
y_test

array([ 7000.,  6100.,  6500., ..., 13200.,  7800., 22700.])

It works - but it looks like we're not quite out of a job yet.

## So, is AutoML an "AGI"?

**No** - it's a search (grid or possibly using genetic/tree pruning/etc. heuristics) in parameter space, with some clever type inference heuristics and a slick interface.

But, it *is* artificial, it *does* give intelligent results, and (like most technology) it *multiplies* productivity. It's not going to "take our jobs" - but it does mean that, in some situations, one data scientist will be able to do what formerly took several to achieve.

## Is Artificial General Intelligence dangerous?

![I'm working to bring about a superintelligent AI that will eternally torment everyone who failed to make fun of the Roko's Basilisk people.](https://imgs.xkcd.com/comics/ai_box_experiment.png)

There's been much philosophizing, thought experimenting, and even some genuine advocacy and policy considerations about the impact of a "true" AGI on human society. Most of these analyses essentially consider the AGI as an unfathomable deity, thinking and moving in ways well beyond human comprehension.

Consider the [paperclip maximizer](https://en.wikipedia.org/wiki/Instrumental_convergence#Paperclip_maximizer):

> Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans. — Nick Bostrom

This is an example of *instrumental convergence* - the idea that, if an AGI were to pursue an unbounded goal (a natural instruction like "Maximize the health of all humans") it may push it in extremely unexpected ways (put all humans in vats of goo, to both preserve them and prevent them from disabling it, since its existence is also of value to help humans).

Is this a *realistic* concern? Well, maybe eventually - but pretty obviously not an immediate one. There are many more prominent challenges involving tech and society - privacy, economic growth, equality, education - and even *if* AGI existed it's not clear how they would have the means to enact such fantastic plans. Killer robot armies make for good TV, but at some step there's likely a human with an off switch.

## Where is AI going, and where does it leave us?

![Lambda calculus? More like SHAMda calculus, amirite?](https://imgs.xkcd.com/comics/ai_research.png)

On the one hand, we live in a remarkable time. The explosion of technology from WWII to present has brought about countless innovations, greatly increased median life expectancy and GDP, and shows no sign of slowing down.

On the other hand, the more things change the more they stay the same. Humans are still Homo sapiens, with the same brains we've had for many millenia. [Dunbar's number](https://en.wikipedia.org/wiki/Dunbar's_number) stymies our attempts to be globally considerate and aware, and at the end of the day it seems like the vast majority of our behavior is as it ever has been - just with shinier toys.

So, what will happen? Will technology usher in a utopia, where automation finally relieves us all of burdensome tasks and we are free to explore science, art, and leisure? Or are we doomed to a dystopia, where increased production is also increasingly centralized and the vast majority of humanity becomes a permanent underclass in a postmodern cyberpunk world?

Probably neither - both are extreme points along a continuum of possibility. But wherever we do end up, it is all but certain that AI (that is, technology generating insights and signal) will be a key part of it.

## And what about A*G*I?

> "I think, therefore I am." -- René Descartes

> "I am a strange loop." -- Douglas Hofstadter

Artificial General Intelligence is, as discussed, a moving target. Perhaps what we're looking for isn't intelligence, but consciousness - and specifically, consciousness *we* recognize and empathize with. Much like all parents, us humans want to foster something new in our image, and see it succeed in a way we appreciate.

It's not clear if technology will ever *really* get there. The structure and approach to artificial intelligence is inherently, well, artificial - some things like neural networks are "inspired" by biology, but still very different (far fewer connections, but far faster with more data). Perhaps computers really already *are* intelligent, just not in a way we recognize.

And if we ever do succeed at making our virtual progeny, we may find it bittersweet - not because they will inevitably destroy us (though they probably will outlast us), but simply because it will then lead us to wonder what is so special about us in the first place. If we can create an AGI from metal and sand, then are we not just mechanisms of a different sort?

# Assignment

Use either [automl-gs](https://github.com/minimaxir/automl-gs) or [TPOT](https://github.com/EpistasisLab/tpot) to solve at least two of your prior assignments, projects, or other past work (any time you fit a classification or regression model). Report the results, and compare/contrast with the results you found when you worked on it using your "human" ML approach.

Note - these tools promise a lot, but the reality is that you may have to debug a bit and figure out getting your data in a format that it recognizes. Welcome to the cutting edge - at least there's still plenty of work to do!

###LS_DS_433_Keras_Assignment

Using Boston Housing - fashion_mnist ran forever!

In [0]:
# TODO - ✨
import pandas as pd
from tpot import TPOTRegressor, TPOTClassifier
import numpy as np
import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics.scorer import make_scorer
import sklearn
from sklearn.metrics import roc_auc_score

In [0]:
# Make a custom metric function
def my_custom_accuracy(y_true, y_pred):
    return float(sum(y_pred == y_true)) / len(y_true)

# Make a custom a scorer from the custom metric function
# Note: greater_is_better=False in make_scorer below would mean that the scoring function should be minimized.
my_custom_scorer = make_scorer(my_custom_accuracy, greater_is_better=True)

In [0]:
# Skipping - runs forever
#from keras.datasets import fashion_mnist

#(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
#X_train = X_train.reshape(60000, 784).astype('float32') /255
#X_test = X_test.reshape(10000, 784).astype('float32') /255


In [0]:
from keras.datasets import boston_housing
# load data
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

# normalize
X_train = X_train/X_train.max(axis=0)

In [30]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((404, 13), (102, 13), (404,), (102,))

In [54]:
%%time

tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2,
#                     n_jobs=-1, scoring=my_custom_scorer)
                     n_jobs=-1)
tpot.fit(X_train, y_train)
print("TPOT Score:", tpot.score(X_test, y_test))

HBox(children=(IntProgress(value=0, description='Optimization Progress', max=120, style=ProgressStyle(descript…

Generation 1 - Current best internal CV score: -14.02713117922806
Generation 2 - Current best internal CV score: -13.759563730393054
Generation 3 - Current best internal CV score: -12.67086481994507
Generation 4 - Current best internal CV score: -12.409412441747262
Generation 5 - Current best internal CV score: -12.409412441747262

Best pipeline: RandomForestRegressor(XGBRegressor(input_matrix, learning_rate=0.1, max_depth=3, min_child_weight=8, n_estimators=100, nthread=1, subsample=0.45), bootstrap=True, max_features=0.9500000000000001, min_samples_leaf=5, min_samples_split=16, n_estimators=100)
-87.76869167232162
CPU times: user 10.4 s, sys: 614 ms, total: 11 s
Wall time: 1min 14s


In [34]:
sorted(sklearn.metrics.SCORERS.keys())

['accuracy',
 'adjusted_mutual_info_score',
 'adjusted_rand_score',
 'average_precision',
 'balanced_accuracy',
 'brier_score_loss',
 'completeness_score',
 'explained_variance',
 'f1',
 'f1_macro',
 'f1_micro',
 'f1_samples',
 'f1_weighted',
 'fowlkes_mallows_score',
 'homogeneity_score',
 'mutual_info_score',
 'my_custom_accuracy',
 'neg_log_loss',
 'neg_mean_absolute_error',
 'neg_mean_squared_error',
 'neg_mean_squared_log_error',
 'neg_median_absolute_error',
 'normalized_mutual_info_score',
 'precision',
 'precision_macro',
 'precision_micro',
 'precision_samples',
 'precision_weighted',
 'r2',
 'recall',
 'recall_macro',
 'recall_micro',
 'recall_samples',
 'recall_weighted',
 'roc_auc',
 'v_measure_score']

In [55]:
tpot.predict(X_test)

array([25.30775757, 25.80842108, 26.17217644, 25.78636394, 25.89888601,
       25.63203903, 26.02417473, 26.02417473, 27.4399327 , 25.63203903,
       25.63203903, 25.63203903, 25.30775757, 26.17217644, 25.32593938,
       25.5902118 , 25.83332108, 26.17217644, 25.63203903, 23.82909796,
       25.32593938, 25.63203903, 27.45291821, 25.73249385, 26.02417473,
       25.63203903, 25.80842108, 26.02417473, 25.30775757, 26.13034921,
       25.80842108, 25.63203903, 30.40119996, 25.70400288, 25.63203903,
       25.30775757, 22.94249512, 25.32593938, 25.78636394, 26.02417473,
       26.02417473, 25.80842108, 25.30775757, 25.63203903, 28.45638579,
       26.17217644, 26.13034921, 25.63203903, 25.30775757, 25.70400288,
       26.02417473, 25.80842108, 25.30775757, 25.63203903, 25.79149385,
       26.17217644, 25.30775757, 25.5902118 , 30.40119996, 26.06600196,
       25.32593938, 25.30775757, 25.34799653, 25.63203903, 25.77432108,
       26.13034921, 25.30775757, 26.02417473, 25.67003145, 25.30

###Heart Disease dataset from UCI

In [0]:
df = pd.read_csv('https://raw.githubusercontent.com/wel51x/Data/master/heart.csv')
X = df.drop('target', axis=1).values
y = df['target'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.8, test_size=0.2)

In [57]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape, 

((242, 13), (61, 13), (242,), (61,))

In [58]:
%%time

tpot = TPOTClassifier(generations=5, population_size=20, random_state=42, verbosity=2, scoring='roc_auc')
tpot.fit(X_train, y_train)
print("TPOT Score:", tpot.score(X_test, y_test))

HBox(children=(IntProgress(value=0, description='Optimization Progress', max=120, style=ProgressStyle(descript…

Generation 1 - Current best internal CV score: 0.8924322590989258
Generation 2 - Current best internal CV score: 0.8967291967291967
Generation 3 - Current best internal CV score: 0.8980920314253649
Generation 4 - Current best internal CV score: 0.8980920314253649
Generation 5 - Current best internal CV score: 0.9051787718454385

Best pipeline: BernoulliNB(OneHotEncoder(RobustScaler(GradientBoostingClassifier(input_matrix, learning_rate=0.001, max_depth=1, max_features=0.2, min_samples_leaf=5, min_samples_split=7, n_estimators=100, subsample=1.0)), minimum_fraction=0.05, sparse=False, threshold=10), alpha=100.0, fit_prior=True)
TPOT Score: 0.9118279569892473
CPU times: user 45.5 s, sys: 4.08 s, total: 49.6 s
Wall time: 44.7 s


In [59]:
tpot.predict(X_test)

array([1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1,
       1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0])

# Resources and Stretch Goals

Stretch goals
- Apply AutoML to more data, including data you've not analyzed or data you're considering for project work
- Try to work with the GPU/TPU options, and see if you can accelerate your AutoML
- Check out other competing AutoML systems (see resources or search and share - many are cloud hosted which is why we went with this)
- Write a blog post summarizing your experience learning Data Science at Lambda School!

Resources
- [What to expect from AutoML software](https://epistasislab.github.io/tpot/using/#what-to-expect-from-automl-software)
- [TPOT examples](https://epistasislab.github.io/tpot/examples/)
- [Google Cloud AutoML](https://cloud.google.com/automl/) - the Google offering in the AutoML space (also has vision, video, NLP, and translation)
- [Microsoft AutoML](https://www.microsoft.com/en-us/research/project/automl/)
- [AutoML.org](https://www.automl.org)
- [Ludwig](https://uber.github.io/ludwig/) - a toolbox for deep learning that doesn't require coding, from Uber
- [USENIX Security '18-Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?](https://youtu.be/ajGX7odA87k) - a humorous but informative presentation by James Mickens, focused on security but with a consideration of data and machine learning