In [1]:
import numpy as np

# 1. [What a Deep Neural Network thinks about your #selfie](http://karpathy.github.io/2015/10/25/selfie/)
* By Andrej Karpathy

## ConvNets
* Recognize various things in photo
* Developed in 1980's
* Ignored until 2012
* Performance:
 * simple
 * fast
 * accurate
* Propagates filters on an image
* Similar to a child learning features to look for

## ConvNets and selfies
* Can be used to score
* Score can be used to find ultimate crop
* Twitter @deepselfie

# 2. Basics of Linear Algebra

## Matrices
* *m*x*n* means *m* rows and *n* columns (height *m*, width *n*)
* *m*x*n* consistent with numpy's *shape*
* **Multiplication**: dot-product of vectors (1st: horiz, 2nd: vert)
* $C = AB$
* $C_{i,j} = a_i\cdot b_j$
* $(m, n) = (m, k)\cdot(k, n)$
* NOT commutative

In [2]:
A = np.array([
[2, 0, 1],
[3, 1, 2]])

B = np.array([
[8, 1, 1, 3],
[0, 2, 7, 9],
[1, 1, 0, 3],])

np.dot(A, B)

array([[17,  3,  2,  9],
       [26,  7, 10, 24]])

## Vectors
* Single column (*n*x1) or row (1x*n*)
* **Dot-product**: sum of corresponding products (same size)

In [3]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# doesn't matter if row or column vectors
np.dot(a, b)

70

# 3. [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
* By Andrej Karpathy

## RNNs
* **Recurrent Neural Networks**
* Traditional ML constrained by fixed I/O dimensions
* RNNs bypass said limitation; work with sequential data
* Variations
 1. fixed->fixed
 1. fixed->sequence
 1. sequence->fixed
 1. sequence->sequence
 1. sequence->sequence (synced)
* Turing-complete (can simulate any program)
* Can also process fixed-size data sequentially
* Single function step
 * vector x -> RNN -> vector y
 * y = rnn.step(x)
 * $h_t=tanh(W_{hh}h_{t−1}+W_{xh}x_t)$
 * $y_t=W_{hy}h_t$
* Also influenced by past steps; i.e. memory
* Add layers by stacking RNNs

## LSTMs
* **Long Short-Term Memory** Networks
* Subclass of RNNs
* "Forget" things, better remember more important ones
* $h_t=...$ gets more complicated
* Temperature parameter (0, 1]
 * near 0: representative and certain, but repetative
 * near 1: diverse, but prone to errors

## Various Applications
* Proof-of-concept applications
 * Writing
 * Code
 * Research Papers
 * Baby names
 * etc.
* Real-world applications
 * Translation
 * Speech to text
 * Handwritten text
 * Image/video classification & captioning
 * Memories and attention

## Understanding the Learning Proccess
* Learns simple concepts, then harder ones
 * Spaces
 * Commas before spaces
 * Periods at sentence end
 * Common, short words
 * Longer words
 * Quotation formatting, advanced punctuation
 * Proper spelling, names
* Higly certain on "http:www", etc.
* Hidden neurons react to certain stimuli and remember certain things
 * Inside containers (quotations, parentheses, markdown, urls), to close them off later
 * Time since container start or line break
 * Remaining ~95% non-interpretable

# 4. [Deep Reinforcement Learning: Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/)
* By Andrej Karpathy

## Reinforcement Learning
* Recent progress in CV and RL mainly driven by compute/data/infrastructure, not algorithms
* Some algoritms:
 * DQN (Deep Q-network)
   * a.k.a. Q-learning
 * PG (Policy Gradients)
   * Direct optimization of reward
* Structure
 * Input (Pixels from Pong)
 * Action (Move paddle up/down)
 * Reward (Change in game score)
 * NN trained on Input->Action
   * Side note: feed in multiple frames
* Credit assignment problem
 * What caused score change and when?
 * How to optimize weights?
 * SOLUTION: Wait until reward, train each step with its action output * reward
   * Output not a distribution: all 0, performed action 1
   * Positive reward encourages perfomed actions, negative discourages
 * Can now solve any problem
* Like supervised, but on a changing dataset
* Alternative: discounted reward
 * Reward based on future states, but exponentially decreasing with time

## Policy Gradients
* Special case of *function gradient estimator*s
* Gradient to optimize parameters to increase reward
* Loss: $\sum_i A_i \log p(y_i \mid x_i)$
 * $E_x [f(x)] = \sum_x p(x) f(x)$ (similar thing, rewritten)
 * $\nabla_{\theta}E_x [f(x)] = E_x[f(x) \nabla_{\theta} \log p(x) ]$ (policy gradient)

## RL vs. humans
* No prior knowledge of reward system
* No intuitive prior knowledge
* Has to 'stumble upon' a strategy to discover it
* Have to experience reward often

## Non-differentiable computation
* Ability of choice from actions creates non-differentiable functions
* PGs can update parameters related to making such choices
* SOLUTION:
 * Update differentiable with gradient descent
 * Update non-differentiable with PG

## Trainiable memory
* Some networks can read/write to memory (to increase performance)
* Only soft I/O differentiable (also very slow)
* Hard I/O non-differentiable
* SOLUTION: Policy Gradients
* Comes with PG drawbacks
 * Mainly, has to stumble onto solutions in vast memory space

## Future research
* Unsupervised generative models and program induction (abstract model building)
* Reward modeled by evaluator network
* Human supervision + apprenticeship learning

## PG in practice
* Need many samples
* Takes a long time
* Check baseline: cross-entropy method
* Variation: TRPO

# 5. [The state of Computer Vision and AI: we are really, really far away](http://karpathy.github.io/2012/10/22/state-of-computer-vision/)
* By Andrej Karpathy

## Inferring information from images
* Many potential inferences from one image
 * Humans can do in a short glance
* CORE ISSUE: deriving information from 2D pixel values
 * Scene information: structure, objects, people
 * Intuitive information: physics, how scales work
 * Social inferences: people's intents, thoughts, reactions
 * Etc.
* AI and CV currently far from being able to retrieve & synthesize this information
 * Not simply a matter of data/resources/tricky algorithms
 * Missing fundamental pieces of the puzzle
 * Long ways off, but a matter of continuous small improvements

# 6. [DeepMind and Blizzard to release StarCraft II as an AI research environment](https://deepmind.com/blog/deepmind-and-blizzard-release-starcraft-ii-ai-research-environment/)
* By Oriol Vinyals (DeepMind)

## StarCraft and AI
* Collaboration between Blizzard and DeepMind to open up StarCraft II to researchers
* Games are a perfect environment for AI development
 * Complex problem solving without specific instructions
* StarCraft connects AI research to the messiness of the real-world
 * In-game economy
 * Partially observable environment & scouting
   * Remembering & recalling information
 * Long-term planning