<a href="https://colab.research.google.com/github/JamieBali/CourseDocs/blob/main/Abstraction_Answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Abstraction

Abstraction is th e process of taking a problem apart to look at its components individually. In general this is an important problem solving principle, but it is one of the most important step towards making complex programs and scripts.

When someone says "What is the best move to make in this chess game," we have to take the game apart, thinking about the movements of our own pieces and the opponent's pieces, until we eventually find the best solution.

When we abstract a programming problem, we can abstract it into the individual steps to solution, break those into specific functions, and then break those down into specific algorithms, until we eventually have a lot of smaller sub-issues we can deal with much easier.

When it comes to AI, it's not just the problems we have to abstract, but the input and output data itself. A computer isn't going to be able to look at a chess board as a whole and immediately go "ah yes, this is the move you need to make," so we instead need to break our input down into the individual tiles of the board and the individual pieces.

We've already looked at breaking down text data last week, so let's re-implement those functions and see what processing we can do.

# Data Importing

When we did our processing last week, we only used a single block of text, but that's not going to make a very accurate AI system. As we mentioned before, AI systems often need a lot of data to make them accurate, so we need to import a lot more data.

Import the file we provided (or more text documents if you'd like) to the google drive in an "AI_Data" folder and run the block below to bring it into Colab. You'll need to sign into your google account to allow Colab to access the file.

In [26]:
# library imports
import os

# colab link
from google.colab import drive
drive.mount('/content/gdrive')

# determine import directory (adjust as required)
training_dir = "/content/gdrive/My Drive/AI_Data/"
filenames=os.listdir(training_dir)

# merge all data into one string
fulltext = ""
for afile in filenames:
  with open(os.path.join(training_dir,afile)) as instream:
    for line in instream:
      line=line.rstrip()
      fulltext += (line.replace("\\","") + " ").lower()

print(fulltext)

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [38]:
def tokenize(input):
  punctuation_marks = ['"',"'",',',':',';','-','.','!','?','“','”','…','*','(',')','’']
  
  for puncts in punctuation_marks:
    input = input.replace(puncts, "")
  
  temp = input.split(" ")
  output = []

  for t in temp:
    if len(t) > 0:
      output.append(t)

  return output

print(tokenize(fulltext))



In [39]:
def make_ngrams(input):
  ngram = {}

  for i in range(0, len(input)-2):
    if not input[i] in ngram:
      ngram[input[i]] = {}
    ngram[input[i]][input[i+1]] = ngram[input[i]].get(input[i+1], 0) + 1

  return ngram

grams = make_ngrams(tokenize(fulltext))

In [40]:
def normalize_ngram(input):
  for token in input:
    total = 0
    for subtoken in input[token]:
      total += input[token][subtoken]
    for subtoken in input[token]:
      input[token][subtoken] /= total
  return input
  
norm_gram = normalize_ngram(grams)

In [56]:
def sort_dictionary(_dict, descending = False, return_as_dict = False):
  srt = sorted(_dict.items(), key=lambda items:items[1], reverse=descending)
  if return_as_dict:
    return dict(srt)
  else:
    return srt

sort_dictionary(norm_gram["valjean"], True, True)

{'was': 0.07407407407407407,
 'had': 0.04669887278582931,
 'and': 0.02254428341384863,
 'felt': 0.02254428341384863,
 'the': 0.02254428341384863,
 'could': 0.020933977455716585,
 'to': 0.01932367149758454,
 'said': 0.017713365539452495,
 'tried': 0.01610305958132045,
 'did': 0.01610305958132045,
 'a': 0.014492753623188406,
 'would': 0.014492753623188406,
 'saw': 0.01288244766505636,
 'turned': 0.011272141706924315,
 'watched': 0.011272141706924315,
 'shook': 0.011272141706924315,
 'looked': 0.011272141706924315,
 'smiled': 0.011272141706924315,
 'with': 0.00966183574879227,
 'but': 0.00966183574879227,
 'found': 0.00966183574879227,
 'heard': 0.00966183574879227,
 'he': 0.00966183574879227,
 'knew': 0.00966183574879227,
 'closed': 0.008051529790660225,
 'who': 0.008051529790660225,
 'clenched': 0.00644122383252818,
 'walked': 0.00644122383252818,
 'lowered': 0.00644122383252818,
 'followed': 0.004830917874396135,
 'allowed': 0.004830917874396135,
 'his': 0.004830917874396135,
 'for': 0