# Using natural language processing to model the discourse effectiveness of text in the domain of physical exercise instructions.

## Phyiscal Exercises:
    1.)   Barbell Wrist Curls
    2.)   Bench Press
    3.)   Crunches
    4.)   Deadlift
    5.)   Dips
    6.)   Dumbbell Curls
    7.)   Dumbbell Kickbacks
    8.)   Dumbbell Shrugs
    9.)   Front Dumbbell Raises
    10.)  Front Squats
    11.)  Good Mornings
    12.)  Hanging Leg Raises
    13.)  Hyperextensions
    14.)  Incline Bench Press
    15.)  Lat Machine Pulldowns
    16.)  Leg Curls
    17.)  Leg Extensions
    18.)  Leg Press
    19.)  Lunges
    20.)  Lying Triceps Extensions
    21.)  Military Press
    22.)  Preacher Curls
    23.)  Reverse Barbell Curls
    24.)  Reverse Crunches
    25.)  Seated Triceps Press
    26.)  Squats
    27.)  Standing Barbell Curls
    28.)  Standing Calf Raises
    29.)  Straight-Leg Deadlifts
    30.)  Triceps Cable Pressdowns
    
## How is the discourse effectiveness determined?
### First, using the researcher's own judgement "good" and "bad" exercise descriptions were determined. The researcher has an undergraduate minor in Exercise Science and felt comfortable discerning the difference between a good description and a bad one. Please see the references section to see where the descriptions came from.

### From there, using the Natural Language Toolkit (NLTK), metrics were used to determine how effective the text was at providing straightforward and safe instructions to perform the physical exercise. Those metrics are: word count, lexical diversity, sentence complexity, sentence similarity, and lexical patterns.

### We are hypothesizing the following: 
###  •  Each good exercise should have a longer description than the associated bad exercise because a good exercise should be more descriptive.
###  •  Each good exercise description should have a greater degree of lexical diversity than the associated bad exercise description to create clear imagery of the movement.
###  • Each good exercise description should have a lesser degree of complexity per sentence than the associated bad exercise description because less complex sentences are easier to follow.
###  •  Each good exercise should description should have sentences that are less similar when compared to the associated bad exercise description. 
###  •  Each good exercise should be less similar when compared to other good exercises. However, each bad exercise should be more similar when compared to other bad exercises. 
###  •  Each good exercise should follow a similar lexical pattern of conveying instructions to perform the movement whereas each bad exercise should not have recognizable lexical pattern.

### Imports, Function Definitions, and Variables

In [1]:
import textacy
from textacy import TextStats
import plotly
plotly.tools.set_credentials_file(username='mlein50', api_key='q1PZkZjmOXj1zAK2UJNy')
import plotly.plotly as py
import plotly.graph_objs as go
from IPython.display import IFrame
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))


def create_spaces(number_spaces):
    count = 0
    spaces = ""
    while count < number_spaces:
        spaces += " "
        count += 1
    return spaces


def make_word_count_array(file_directory):
    word_count_array = []
    for individualEx in exercise_list_single_line:
        with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/" + file_directory + "/"
                  + individualEx + ".txt", "r") as badEx:
            text = badEx.read()
            words = text.split()
            word_count_array.append(len(words))
    return word_count_array


def lexical_diversity(exercise):
    return len(set(exercise)) / len(exercise)


def make_lexical_diversity_array(file_directory):
    lexical_diversity_array = []
    for individEx in exercise_list_single_line:
        with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/" + file_directory +
                  "/" + individEx + ".txt", "r") as ex:
            text = ex.read()
            lexical_diversity_array.append(lexical_diversity(text))
    return lexical_diversity_array


def text_to_doc(file_directory):
    doc_array = []
    for individualEx in exercise_list_single_line:
        with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/" + file_directory +
                  "/" + individualEx + ".txt", "r") as exercise:
            exerciseText = exercise.read()
            docx = textacy.Doc(textacy.preprocess_text(exerciseText, lowercase=True))
            doc_array.append(docx)
    return doc_array


def get_jaccard_sim(str1, str2):
    first = set(str1.split())
    second = set(str2.split())
    intersect = first.intersection(second)
    return float(len(intersect)) / (len(first) + len(second) - len(intersect))


def make_text_similarity_array(file_directory):
    ts_array = []
    for i in exercise_list_single_line:
        with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/" + file_directory +
                  "/" + i + ".txt", "r") as ex1:
            text1 = ex1.read()
        avg_exercise_similarity = 0
        for j in exercise_list_single_line:
            with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/" + file_directory +
                      "/" + j + ".txt", "r") as ex2:
                text2 = ex2.read()
                similarity = get_jaccard_sim(text1, text2)
                if similarity != 1.0:
                    avg_exercise_similarity += similarity
        ts_array.append(avg_exercise_similarity/29)
    return ts_array


exercise_list = open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/exerciseList.txt", "r")
exercise_list_single_line = [line.rstrip('\n') for line in exercise_list]

# Word Count

## The word count of each good exercise was determined by the following function

In [2]:
goodWordCountAvg = 0
goodWordCountArray = []

for indivEx in exercise_list_single_line:
    with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/GOOD/" 
              + indivEx + ".txt", "r") as goodEx:
        text = goodEx.read()
        words = text.split()
        wordTotal = len(words)  #  find word count
        goodWordCountArray.append(wordTotal)
        goodWordCountAvg += wordTotal
        
data = [go.Bar(
            x=goodWordCountArray,
            y=exercise_list_single_line,
            marker=dict(
                color='rgba(88, 127, 236, 1.0)'),
            orientation ='h'
)]

layout = go.Layout(
    title="Good Exercise Descriptions Word Counts",
    xaxis=dict(
        title="Number of Words",
        tickfont=dict(
            size=12)),    
    yaxis=dict(
        title="Exercise",
        tickprefix="",
        showtickprefix="all",
        tickfont=dict(
            size=10)), 
    plot_bgcolor='rgb(220, 220, 220)',
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='Good Exercise Word Count Bar')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~mlein50/0 or inside your plot.ly account where it is named 'Good Exercise Word Count Bar'



Consider using IPython.display.IFrame instead



## The word count of each bad exercise was determined by the same function 

In [3]:
badWordCountAvg = 0
badWordCountArray = []

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.


for indivEx in exercise_list_single_line:
    with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/BAD/" 
              + indivEx + ".txt", "r") as badEx:
        text = badEx.read()
        words = text.split()
        wordTotal = len(words)  #  find word count
        badWordCountArray.append(wordTotal)
        badWordCountAvg += wordTotal


        
data = [go.Bar(
            x=badWordCountArray,
            y=exercise_list_single_line,
            marker=dict(
                color='rgba(222, 95, 95, 1.0)'),
            orientation ='h'
)]

layout = go.Layout(
    title="Bad Exercise Descriptions Word Counts",
    xaxis=dict(
        title="Number of Words",
        tickfont=dict(
            size=12)),    
    yaxis=dict(
        title="Exercise",
        tickprefix="",
        showtickprefix="all",
        tickfont=dict(
            size=10)), 
    plot_bgcolor='rgb(220, 220, 220)',
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='Bad Exercise Word Count Bar')

## By doing simple subtraction we can see the difference between the number of words used in good and bad exercise instructions.

### NOTE: A negative number represents the good exercise being shorter than the bad exercise

In [4]:
#  Difference equation: (Good exercise) - (Bad exercise) = difference in word count
wordCountDifferenceArray = []

goodExerciseWordCount = list((make_word_count_array("GOOD")))
badExerciseWordCount = list((make_word_count_array("BAD")))

i = 0

while i < len(badExerciseWordCount) and i < len(goodExerciseWordCount):
    wordCountDifferenceArray.append((goodExerciseWordCount[i] - badExerciseWordCount[i]))
    i += 1

if i == len(badExerciseWordCount) and i == len(goodExerciseWordCount):
    i = 0
    
#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t |", "Word Count Difference\n", end="")
print("------------------------------------------------")

for indivEx in exercise_list_single_line:
    spaces_to_add = len(exercise_list_single_line[29]) - len(indivEx) #  find word count difference
    print(indivEx, end="")
    print(create_spaces(spaces_to_add), "|", wordCountDifferenceArray[i])
    i += 1

Exercise		 | Word Count Difference
------------------------------------------------
Barbell Wrist Curls      | 67
Bench Press              | 55
Crunches                 | 82
Deadlift                 | 16
Dips                     | 61
Dumbbell Curls           | 37
Dumbbell Kickbacks       | 37
Dumbbell Shrugs          | -2
Front Dumbbell Raises    | 71
Front Squats             | 1
Good Mornings            | 6
Hanging Leg Raises       | 30
Hyperextensions          | 6
Incline Bench Press      | 25
Lat Machine Pulldowns    | -29
Leg Curls                | 49
Leg Extensions           | 45
Leg Press                | 20
Lunges                   | -13
Lying Triceps Extensions | 65
Military Press           | 20
Preacher Curls           | 63
Reverse Barbell Curls    | 53
Reverse Crunches         | 88
Seated Triceps Press     | 21
Squats                   | 10
Standing Barbell Curls   | 76
Standing Calf Raises     | 100
Straight-Leg Deadlifts   | 49
Triceps Cable Pressdowns | 40


### Average Word Count

In [5]:
print("Text Category  | Word Count Avg.")
print("--------------------------------")
print("Good Exercises |    ", round(goodWordCountAvg/30))
print("Bad Exercises  |    ", round(badWordCountAvg/30))

Text Category  | Word Count Avg.
--------------------------------
Good Exercises |     113
Bad Exercises  |     75


## The above results indicate only three bad exercise descriptions are longer than their "good" counterpart. Those three exercises are: Dumbbell Shrugs, Lat Machine Pulldowns, and Lunges.

# Lexical Diversity

## The lexical diversity of each exercise was determined using the function as described in [1].

### Good Exercises

In [6]:
goodLexicalDiversityAvg = 0

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t |", "Lexical Diversity\n", end="")
print("------------------------------------------------")

for indivEx in exercise_list_single_line:
    with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/GOOD/" + indivEx + ".txt", "r") as good:
        text = good.read()
        spaces_to_add = len(exercise_list_single_line[29]) - len(indivEx)
        lexical_diversity_value = lexical_diversity(text)
        goodLexicalDiversityAvg += lexical_diversity_value
        print(indivEx,  end='')
        print(create_spaces(spaces_to_add), "|", lexical_diversity_value)
        
        
goodLexicalDiversityAvg /= 30


Exercise		 | Lexical Diversity
------------------------------------------------
Barbell Wrist Curls      | 0.046008119079837616
Bench Press              | 0.052478134110787174
Crunches                 | 0.043173862310385065
Deadlift                 | 0.059233449477351915
Dips                     | 0.05037037037037037
Dumbbell Curls           | 0.046130952380952384
Dumbbell Kickbacks       | 0.0380952380952381
Dumbbell Shrugs          | 0.1
Front Dumbbell Raises    | 0.04240766073871409
Front Squats             | 0.05154639175257732
Good Mornings            | 0.09302325581395349
Hanging Leg Raises       | 0.09602649006622517
Hyperextensions          | 0.08080808080808081
Incline Bench Press      | 0.0658682634730539
Lat Machine Pulldowns    | 0.08814589665653495
Leg Curls                | 0.05752961082910321
Leg Extensions           | 0.05255023183925812
Leg Press                | 0.07045454545454545
Lunges                   | 0.048154093097913325
Lying Triceps Extensions | 0.0506634499

### Bad Exercises

In [7]:
badLexicalDiversityAvg = 0

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t |", "Lexical Diversity\n", end="")
print("------------------------------------------------")

for indivEx in exercise_list_single_line:
    with open("/Users/matthewleinhauser/Documents/NLP Research Project/Exercises/BAD/" + indivEx + ".txt", "r") as bad:
        text = bad.read()
        spaces_to_add = len(exercise_list_single_line[29]) - len(indivEx)
        lexical_diversity_value = lexical_diversity(text)
        badLexicalDiversityAvg += lexical_diversity_value
        print(indivEx,  end='')
        print(create_spaces(spaces_to_add), "|", lexical_diversity_value)
        
        
badLexicalDiversityAvg /= 30


Exercise		 | Lexical Diversity
------------------------------------------------
Barbell Wrist Curls      | 0.10497237569060773
Bench Press              | 0.08743169398907104
Crunches                 | 0.07673267326732673
Deadlift                 | 0.06276150627615062
Dips                     | 0.08357348703170028
Dumbbell Curls           | 0.06622516556291391
Dumbbell Kickbacks       | 0.047244094488188976
Dumbbell Shrugs          | 0.08333333333333333
Front Dumbbell Raises    | 0.07004830917874397
Front Squats             | 0.05110497237569061
Good Mornings            | 0.09688581314878893
Hanging Leg Raises       | 0.23275862068965517
Hyperextensions          | 0.0855457227138643
Incline Bench Press      | 0.08533333333333333
Lat Machine Pulldowns    | 0.06286836935166994
Leg Curls                | 0.08641975308641975
Leg Extensions           | 0.07192575406032482
Leg Press                | 0.1
Lunges                   | 0.04978038067349927
Lying Triceps Extensions | 0.0625
Military 

## By doing simple subtraction we can see the difference between the lexical diversity used in good and bad exercise instructions.

### NOTE: A negative number represents the good exercise being less lexically diverse than the bad exercise

In [8]:
lexicalDiversityDifferenceArray = []

goodExerciseLexicalDiversity = list((make_lexical_diversity_array("GOOD")))
badExerciseLexicalDiversity = list((make_lexical_diversity_array("BAD")))

i = 0
while i < len(badExerciseLexicalDiversity) and i < len(goodExerciseLexicalDiversity):
    lexicalDiversityDifferenceArray.append((goodExerciseLexicalDiversity[i] - badExerciseLexicalDiversity[i]))
    i += 1

if i == len(badExerciseLexicalDiversity) and i == len(goodExerciseLexicalDiversity):
    i = 0
    
#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t |", "Lexical Diversity Difference\n", end="")
print("------------------------------------------------")

for indivEx in exercise_list_single_line:
    spaces_to_add = len(exercise_list_single_line[29]) - len(indivEx)
    print(indivEx, end="")
    print(create_spaces(spaces_to_add), "|", lexicalDiversityDifferenceArray[i])
    i += 1


Exercise		 | Lexical Diversity Difference
------------------------------------------------
Barbell Wrist Curls      | -0.058964256610770115
Bench Press              | -0.034953559878283864
Crunches                 | -0.033558810956941666
Deadlift                 | -0.0035280567987987094
Dips                     | -0.03320311666132991
Dumbbell Curls           | -0.02009421318196153
Dumbbell Kickbacks       | -0.009148856392950877
Dumbbell Shrugs          | 0.016666666666666677
Front Dumbbell Raises    | -0.027640648440029877
Front Squats             | 0.0004414193768867078
Good Mornings            | -0.0038625573348354397
Hanging Leg Raises       | -0.13673213062342998
Hyperextensions          | -0.004737641905783491
Incline Bench Press      | -0.019465069860279433
Lat Machine Pulldowns    | 0.025277527304865016
Leg Curls                | -0.028890142257316537
Leg Extensions           | -0.019375522221066706
Leg Press                | -0.029545454545454555
Lunges                   | -0.

### Average Lexical Diversity Scores

In [9]:
#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Text Category  | Lexical Diversity Score Avg.")
print("---------------------------------------------")
print("Good Exercises |\t", round(goodLexicalDiversityAvg * 100, 3))
print("Bad Exercises  |\t", round(badLexicalDiversityAvg * 100, 3))

Text Category  | Lexical Diversity Score Avg.
---------------------------------------------
Good Exercises |	 5.877
Bad Exercises  |	 8.028


## The above results indicate only three bad exercise descriptions are longer than their "good" counterpart. Those three exercises are: Dumbbell Shrugs, Front Squats, and Lat Machine Pulldowns

# Text Complexity

## To measure text complexity, the researcher used four readability metrics as defined in [11] and the functions used are found in the API of [10]. Those metrics are: Flesch-Kincaid Reading Ease, Flesch-Kincaid Grade Level, Gunning-Fog Score, and Automated Readability Index

### NOTE: These readability metrics were rounded to the nearest integer

### Good Exercise Readability Metric Scores

In [10]:
good_reading_ease_avg = 0
good_grade_level_avg = 0
good_gunning_fog_avg = 0
good_ari_avg = 0

goodExerciseDocArray = list((text_to_doc("GOOD")))

i = 0

while i < len(goodExerciseDocArray):
    ts1 = TextStats(goodExerciseDocArray[i])
    goodExerciseDocArray[i] = list((round(ts1.flesch_reading_ease),
                                    round(ts1.flesch_kincaid_grade_level),
                                    round(ts1.gunning_fog_index),
                                    round(ts1.automated_readability_index)))
    if goodExerciseDocArray[i][1] > 12:
        goodExerciseDocArray[i][1] = 12
    if goodExerciseDocArray[i][3] > 12:
        goodExerciseDocArray[i][3] = 12
    good_reading_ease_avg += goodExerciseDocArray[i][0]
    good_grade_level_avg += goodExerciseDocArray[i][1]
    good_gunning_fog_avg += goodExerciseDocArray[i][2]
    good_ari_avg += goodExerciseDocArray[i][3]
    i += 1

if i == len(goodExerciseDocArray):
    good_reading_ease_avg /= 30
    good_grade_level_avg /= 30
    good_gunning_fog_avg /= 30
    good_ari_avg /= 30
    i = 0

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t | Reading Ease | Grade Level | Gunning-Fog Score | Automated Readability Index")
print("-------------------------------------------------------------------------------------------------------")

for individualEx in exercise_list_single_line:
    character_dif = len(exercise_list_single_line[29]) - len(individualEx)
    print(individualEx, end='')
    print(create_spaces(character_dif), "|\t", goodExerciseDocArray[i][0], end='')
    character_dif = len(str(goodExerciseDocArray[4][0])) - len(str(goodExerciseDocArray[i][0]))
    print(create_spaces(character_dif), "    |      ", goodExerciseDocArray[i][1], end='')
    character_dif = len(str(goodExerciseDocArray[4][1])) - len(str(goodExerciseDocArray[i][1]))
    print(create_spaces(character_dif), "   |\t     ", goodExerciseDocArray[i][2], end='')
    character_dif = len(str(goodExerciseDocArray[4][2])) - len(str(goodExerciseDocArray[i][2]))
    print(create_spaces(character_dif), "\t  |\t    ", goodExerciseDocArray[i][3])
    i += 1


Exercise		 | Reading Ease | Grade Level | Gunning-Fog Score | Automated Readability Index
-------------------------------------------------------------------------------------------------------
Barbell Wrist Curls      |	 81     |       7     |	      10 	  |	     9
Bench Press              |	 81     |       6     |	      8  	  |	     7
Crunches                 |	 80     |       6     |	      9  	  |	     8
Deadlift                 |	 85     |       5     |	      8  	  |	     6
Dips                     |	 65     |       12    |	      14 	  |	     12
Dumbbell Curls           |	 67     |       10    |	      12 	  |	     12
Dumbbell Kickbacks       |	 76     |       7     |	      9  	  |	     9
Dumbbell Shrugs          |	 90     |       4     |	      6  	  |	     4
Front Dumbbell Raises    |	 74     |       10    |	      12 	  |	     12
Front Squats             |	 86     |       5     |	      8  	  |	     5
Good Mornings            |	 82     |       6     |	      9  	  |	     8
Hanging Leg

### Bad Exercise Readability Metric Scores

In [11]:
bad_reading_ease_avg = 0
bad_grade_level_avg = 0
bad_gunning_fog_avg = 0
bad_ari_avg = 0

badExerciseDocArray = list((text_to_doc("BAD")))

i = 0

while i < len(badExerciseDocArray):
    ts2 = TextStats(badExerciseDocArray[i])
    badExerciseDocArray[i] = list((round(ts2.flesch_reading_ease),
                                   round(ts2.flesch_kincaid_grade_level),
                                   round(ts2.gunning_fog_index),
                                   round(ts2.automated_readability_index)))
    if badExerciseDocArray[i][1] > 12:
        badExerciseDocArray[i][1] = 12
    if badExerciseDocArray[i][3] > 12:
        badExerciseDocArray[i][3] = 12
    bad_reading_ease_avg += badExerciseDocArray[i][0]
    bad_grade_level_avg += badExerciseDocArray[i][1]
    bad_gunning_fog_avg += badExerciseDocArray[i][2]
    bad_ari_avg += badExerciseDocArray[i][3]
    i += 1

if i == len(badExerciseDocArray):
    bad_reading_ease_avg /= 30
    bad_grade_level_avg /= 30
    bad_gunning_fog_avg /= 30
    bad_ari_avg /= 30
    i = 0

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Exercise\t\t | Reading Ease | Grade Level | Gunning-Fog Score | Automated Readability Index")
print("-------------------------------------------------------------------------------------------------------")

for individualEx in exercise_list_single_line:
    character_dif = len(exercise_list_single_line[29]) - len(individualEx)
    print(individualEx, end='')
    print(create_spaces(character_dif), "|\t", badExerciseDocArray[i][0], end='')
    character_dif = len(str(badExerciseDocArray[8][0])) - len(str(badExerciseDocArray[i][0]))
    print(create_spaces(character_dif), "    |      ", badExerciseDocArray[i][1], end='')
    character_dif = len(str(badExerciseDocArray[8][1])) - len(str(badExerciseDocArray[i][1]))
    print(create_spaces(character_dif), "   |\t     ", badExerciseDocArray[i][2], end='')
    character_dif = len(str(badExerciseDocArray[8][2])) - len(str(badExerciseDocArray[8][2]))
    print(create_spaces(character_dif), "\t  |\t    ", badExerciseDocArray[i][3])
    i += 1


Exercise		 | Reading Ease | Grade Level | Gunning-Fog Score | Automated Readability Index
-------------------------------------------------------------------------------------------------------
Barbell Wrist Curls      |	 84     |       6     |	      9 	  |	     7
Bench Press              |	 91     |       3     |	      7 	  |	     4
Crunches                 |	 82     |       6     |	      9 	  |	     6
Deadlift                 |	 80     |       6     |	      9 	  |	     8
Dips                     |	 71     |       8     |	      12 	  |	     10
Dumbbell Curls           |	 78     |       8     |	      10 	  |	     10
Dumbbell Kickbacks       |	 73     |       8     |	      10 	  |	     9
Dumbbell Shrugs          |	 72     |       7     |	      8 	  |	     9
Front Dumbbell Raises    |	 70     |       10    |	      12 	  |	     12
Front Squats             |	 75     |       6     |	      8 	  |	     6
Good Mornings            |	 77     |       5     |	      6 	  |	     6
Hanging Leg Raises

### Readability Metrics Averages

In [12]:
good_reading_ease_avg = round(good_reading_ease_avg, 3)
good_grade_level_avg = round(good_grade_level_avg)
good_gunning_fog_avg = round(good_gunning_fog_avg)
good_ari_avg = round(good_ari_avg)
bad_reading_ease_avg = round(bad_reading_ease_avg, 3)
bad_grade_level_avg = round(bad_grade_level_avg)
bad_gunning_fog_avg = round(bad_gunning_fog_avg)
bad_ari_avg = round(bad_ari_avg)

#  The following lines are only for printing format on the Jupyter Notebook and in no way are a reflection of
#  good coding standards and practices.

print("Text Category  | Reading Ease Avg. | Grade Level Avg. | Gunning-Fog Score Avg. | "
      "Automated Readability Index Avg.")
print("-----------------------------------------------------------------------------------------------------"
      "------------")
print("Good Exercises |      ", good_reading_ease_avg, "\t   |\t   ", good_grade_level_avg, "        |\t\t",
      good_gunning_fog_avg, "\t       |\t", good_ari_avg)
print("Bad Exercises  |      ", bad_reading_ease_avg, "\t   |\t   ", bad_grade_level_avg, "        |\t\t",
      bad_gunning_fog_avg, "\t       |\t", bad_ari_avg)

Text Category  | Reading Ease Avg. | Grade Level Avg. | Gunning-Fog Score Avg. | Automated Readability Index Avg.
-----------------------------------------------------------------------------------------------------------------
Good Exercises |       77.933 	   |	    7         |		 10 	       |	 9
Bad Exercises  |       78.6 	   |	    6         |		 9 	       |	 8


# Text Similarity

## To measure text similarity, the researcher used a few different metrics. The first metric used is Jaccard Similarity as defined as a term and python function in [12]. The results shown for each exercise under the sub-headings "Good Exercise Text Similarity Metric Scores" and "Bad Exercise Text Similarity Metric Scores" are an average of the text similarity of each exercise compared to the other 29 exercises.

### Good Exercise Text Similarity Metric Scores

In [13]:
text_similarity_array = []

good_exercise_similarity_array = list((make_text_similarity_array("GOOD")))
bad_exercise_similarity_array = list((make_text_similarity_array("BAD")))


### Bad Exercise Text Similarity Metric Scores

### Text Similarity Metrics Averages

In [14]:
good_similarity_total = 0
bad_similarity_total = 0
k = 0

while k < len(exercise_list_single_line):
    good_similarity_total += float(good_exercise_similarity_array[k])
    bad_similarity_total += float(bad_exercise_similarity_array[k])
    k += 1
    
good_similarity_avg = good_similarity_total/len(exercise_list_single_line)
bad_similarity_avg = bad_similarity_total/len(exercise_list_single_line)

print(good_similarity_avg)
print(bad_similarity_avg)

0.15425529993832607
0.17716052153679454


# References

## Books Used:
### [1] Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. Beijing: OReilly Media.
### [2] Schwarzenegger, A., & Dobbins, B. (2012). The New Encyclopedia of Modern Bodybuilding:. Simon & Schuster USA.
### [3] Price, R. G. (2014). Ultimate Guide to Weight Training for Running. Chicago, IL: Price World Publishing.
### [4] Price, R. G. (2004). The Ultimate Guide to Weight Training for Baseball and Softball. Cleveland, OH: Price World Enterprises.
### [5] Price, R. G. (2006). Ultimate Guide to Weight Training for Golf. Cleveland, OH: Price World Enterprises.
### [6] Little, J. R. (2008). Beginning Bodybuilding: Real Muscle/Real Fast. New York, NY: McGraw-Hill.
### [7] Zyzz's Bodybuilding Bible. (2011).
### [8] Brungardt, K. (1999). The Complete Book of Abs. New York, NY: Random House.
### [9] Explosion AI. (2016). Architecture · spaCy API Documentation. Retrieved from https://spacy.io/api/
### [10] Chartbeat, Inc. (2016). API Reference. Retrieved from https://chartbeat-labs.github.io/textacy/api_reference.html
### [11] Conzett, L. (2017, October 10). What are Readability Metrics? Retrieved from https://raven.zendesk.com/hc/en-us/articles/202308564-What-are-Readability-Metrics-