# Paper Grading System

#### The paper grading system will take answers as input and would compare them with the ideal answer to grade students. The answers will be processed and broken down into root words to compare with the ideal answers.
#### Using semantic analysis, the similarity between words will be measured by similarity index which would return a score. Even if different synonyms are used, the program maps the synonyms using Ontology and generates a perfect score. Hence, if different words are used and the same meaning is conveyed, then the student would get full marks. 

#### Steps followed in processing-
##### 1. Read Data from File
##### 2. Identify and remove the stopwords
##### 3. Use the Lemmatizer to tokenize the words and fetch it's root form (example - "drinking" becomes "drink")
##### 4. Calculate similarity Index and based on a threshold, return the grade.

## 1. Import Libraries

In [106]:
import nltk
nltk.download('wordnet')
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords,wordnet
from nltk.stem import WordNetLemmatizer
from itertools import product
import numpy
import nltk
nltk.download('stopwords')
nltk.download('punkt')
import pandas as pd


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\bhush\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\bhush\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\bhush\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## 2.Read documents or strings

In [107]:
with open(r'C:\Users\bhush\OneDrive\Desktop\Kaggle\Paper Grading\Ans1.txt', 'r') as file_1:
    str1 = file_1.read().replace('\n', '')
    
with open(r'C:\Users\bhush\OneDrive\Desktop\Kaggle\Paper Grading\Ans2.txt', 'r') as file_2:
    str2 = file_2.read().replace('\n', '')

In [108]:
# Sample strings to compare

# str1 = "Abhishek is a good boy."
# str2 = "Abhishek is not a bad boy."
#str1 = "Cat is drinking water."
#str2 = "Lions drinks flesh"
# str1 = "He loves to play football."
# str2 = "Football is his favourite sport."
# str1 = "Many consider Maradona as the best player in soccer history."
# str2 = "Maradona is one of the best soccer player."

#str1 = "A database is an organized collection of data.An ER diagram comprises of entities which are connected to each other via relationships , attributes form the basis of entity recognition."
#str2 = "Database is a strcutred group of data. An ER diagram is a collection of entities and their features which are linked using relationships ,features are the base of identifying an entity  "

# str1 = "Ballmer has been vocal in the past warning that Linux is a threat to Microsoft."
# str2 = "In the memo, Ballmer reiterated the open-source threat to Microsoft."
# str1 = "A school is a place where kids go to study."
# str2 = "School is an institution for children who want to study."
# str1 = "The world knows it has lost a heroic champion of justice and freedom."
# str2 = "The earth recognizes the loss of a valiant champion of independence and justice."
# str1 = "A cemetery is a place where dead people's bodies or their ashes are buried."
# str2 = "A graveyard is an area of land ,sometimes near a church, where dead people are buried." 


In [109]:

##---------------Defining stopwords for English Language---------------##

stop_words = set(stopwords.words("english"))

##---------------Initialising Lists---------------##
filtered_sentence1 = []
filtered_sentence2 = []
lemm_sentence1 = []
lemm_sentence2 = []
sims = []
temp1 = []
temp2 = []
simi = []
final = []
same_sent1 = []
same_sent2 = []
#ps = PorterStemmer()


## 3. Process the first document

In [110]:

##---------------Defining WordNet Lematizer --------------##
lemmatizer  =  WordNetLemmatizer()

##---------------Tokenizing and removing the Stopwords---------------##

for words1 in word_tokenize(str1):
    if words1 not in stop_words:
        if words1.isalnum():
            filtered_sentence1.append(words1)

##---------------Lemmatizing: Root Words---------------##

for i in filtered_sentence1:
    lemm_sentence1.append(lemmatizer.lemmatize(i))



print(lemm_sentence1)


['Machine', 'learning', 'seen', 'use', 'case', 'ranging', 'predicting', 'customer', 'behavior', 'forming', 'operating', 'system', 'come', 'advantage', 'machine', 'learning', 'help', 'enterprise', 'understand', 'customer', 'deeper', 'level', 'By', 'collecting', 'customer', 'data', 'correlating', 'behavior', 'time', 'machine', 'learning', 'algorithm', 'learn', 'association', 'help', 'team', 'tailor', 'product', 'development', 'marketing', 'initiative', 'customer', 'company', 'use', 'machine', 'learning', 'primary', 'driver', 'business', 'model', 'Uber', 'example', 'us', 'algorithm', 'match', 'driver', 'rider', 'Google', 'us', 'machine', 'learning', 'surface', 'ride', 'advertisement', 'machine', 'learning', 'come', 'disadvantage', 'First', 'foremost', 'expensive', 'Machine', 'learning', 'project', 'typically', 'driven', 'data', 'scientist', 'command', 'high', 'salary', 'These', 'project', 'also', 'require', 'software', 'infrastructure', 'also', 'problem', 'machine', 'learning', 'bias', 'A

## 4. Process the second document

In [111]:

##---------------Tokenizing and removing the Stopwords---------------##

for words2 in word_tokenize(str2):
    if words2 not in stop_words:
        if words2.isalnum():
            filtered_sentence2.append(words2)

##---------------Lemmatizing: Root Words---------------##

for i in filtered_sentence2:
    lemm_sentence2.append(lemmatizer.lemmatize(i))
    
print(lemm_sentence2)

['Machine', 'learning', 'seen', 'use', 'case', 'ranging', 'predicting', 'customer', 'behavior', 'developing', 'operating', 'system', 'autonomous', 'car', 'When', 'come', 'profit', 'machine', 'learning', 'help', 'company', 'understand', 'customer', 'deeper', 'level', 'By', 'collecting', 'customer', 'data', 'correlating', 'behavior', 'time', 'machine', 'learning', 'algorithm', 'learn', 'association', 'help', 'team', 'tailor', 'product', 'development', 'marketing', 'initiative', 'meet', 'customer', 'demand', 'Some', 'company', 'use', 'machine', 'learning', 'main', 'driver', 'business', 'model', 'For', 'example', 'Uber', 'us', 'algorithm', 'match', 'driver', 'passenger', 'Google', 'us', 'machine', 'learning', 'show', 'ad', 'generate', 'search', 'ad', 'But', 'machine', 'learning', 'drawback', 'First', 'expensive', 'Machine', 'learning', 'project', 'often', 'driven', 'data', 'scientist', 'demanding', 'high', 'salary', 'These', 'project', 'also', 'require', 'software', 'infrastructure', 'expe

## 5. Filter identical words

In [113]:

#---------------Removing the same words from the tokens----------------##
identical=[]
for word1 in lemm_sentence1:
    for word2 in lemm_sentence2:
        if word1 == word2:
            if word1 in lemm_sentence1: 
                lemm_sentence1.remove(word1)
            if word2 in lemm_sentence2: 
                lemm_sentence2.remove(word2)
            identical.append(word1)
    
print(identical)

['Machine', 'Machine', 'seen', 'case', 'predicting', 'behavior', 'behavior', 'operating', 'come', 'machine', 'machine', 'machine', 'machine', 'machine', 'machine', 'help', 'help', 'understand', 'deeper', 'By', 'customer', 'customer', 'customer', 'customer', 'learning', 'learning', 'learning', 'learning', 'learning', 'learning', 'learning', 'learning', 'team', 'product', 'marketing', 'company', 'company', 'company', 'driver', 'driver', 'model', 'model', 'model', 'example', 'algorithm', 'algorithm', 'Google', 'First', 'expensive', 'expensive', 'driven', 'scientist', 'high', 'These', 'also', 'also', 'software', 'problem', 'Algorithms', 'data', 'data', 'data', 'population', 'error', 'inaccurate', 'best', 'worst', 'When', 'base', 'business', 'business', 'regulatory']


## 5. Calculate similarity Index for each word

In [114]:
##---------------Similarity index calculation for each word---------------##
for word1 in lemm_sentence1:
    simi =[]
    for word2 in lemm_sentence2:
        sims = []
        #print(word1)
        #print(word2)
        syns1 = wordnet.synsets(word1)
        #print(syns1)
        syns2 = wordnet.synsets(word2)
        #print(syns2)
        for sense1, sense2 in product(syns1, syns2):
            d = wordnet.wup_similarity(sense1, sense2)
            if d != None:
                sims.append(d)
    
        #print(sims)
        #print(max(sims))
        if sims != []:        
            max_sim = max(sims)
           #print(max_sim)
            simi.append(max_sim)
             
    if simi != []:
        max_final = max(simi)
        final.append(max_final)

In [115]:
print(final) ### Computes the similarity between words

[1.0, 1.0, 0.8, 1.0, 0.9230769230769231, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9230769230769231, 0.8571428571428571, 1.0, 0.8, 0.9333333333333333, 0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.6666666666666666, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]


In [116]:
##---------------Final Output---------------##

similarity_index = numpy.mean(final)
similarity_index = round(similarity_index , 2)
print("Sentence 1: ",str1)
print("")

print("--------Tokenization--------")
print(word_tokenize(str1))

print("--------After Stop Words Removal--------")
print(lemm_sentence1)
print("")
print("")

print("Sentence 2: ",str2)
print("")

print("--------Tokenization--------")
print(word_tokenize(str2))

print("--------After Stop Words Removal--------")
print(lemm_sentence2)

print("")



print("Similarity index value : ", similarity_index)

if similarity_index>0.9:
    print("Full Grade")
elif similarity_index>=0.6 & similarity_index<=0.9:
    print("Partial Grade")
else:
    print("Low Grade")

Sentence 1:  Machine learning has seen use cases ranging from predicting customer behavior to forming the operating system for self-driving cars.When it comes to advantages, machine learning can help enterprises understand their customers at a deeper level. By collecting customer data and correlating it with behaviors over time, machine learning algorithms can learn associations and help teams tailor product development and marketing initiatives to customer demand.Some companies use machine learning as a primary driver in their business models. Uber, for example, uses algorithms to match drivers with riders. Google uses machine learning to surface the ride advertisements in searches.But machine learning comes with disadvantages. First and foremost, it can be expensive. Machine learning projects are typically driven by data scientists, who command high salaries. These projects also require software infrastructure that can be expensive.There is also the problem of machine learning bias. 