### Lab Task: Cassandra Practice 
#### To Do Steps:
1. Create a keyspace with simple strategy and repliccation factor of 2
2. Create a column family that can store the following information of a student 
<ul><li>Student Roll Number</li>
<li>Student Name</li>
<li>Enrolled Course (Each student can have multiple courses)</li>
<li>Semester (E.g Spring 2022)</li>
<li>Percentage of Marks Obtained By Students in Each Enrolled Course</li>
<li>Grade Obtained By Students in Each Enrolled Course</li>
<li>GPA Obtained By Students in Each Enrolled Course</li>
<li>If a student is Fresh or Repeating the Course</li></ul>
3. Now insert data of 5 Distinct and none or 1 with repeatation for each student 
4. Thier should be data of atleast 10 students
5. Now you have to explore following insights from data
<ul><li>Display semester wise students in a sorted way. For example all student in Spring 2022 sorted on the basis of percentages they obtained.</li>
<li>Display only students that are enrolled in specific semester.</li>
<li>Display only students that are enrolled in specific course.</li>
<li>Students having grade 'A' in a specific course in a given semester. (For example I want to see who scored 'A' in Deep Learning)</li>
<li>Students who are repeating specific course in a given semester.</li>
<li>Total Number of students who are repeating specific course in a given semester.</li>
<li>In which course a student obtained maximum or minimum percentange.</li>
<li>Can you calculate the CGPA of each student using the data you created.</li></ul>

In [1]:
import cassandra
from cassandra.cluster import Cluster
try: 
    cluster = Cluster(['127.0.0.1'], port=9042) 
    session = cluster.connect()
except Exception as e:
    print(e)

**Step 1**: Create a keyspace with simple strategy and repliccation factor of 2

In [2]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS university 
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 2 }"""
)   
    rows = session.execute("""SELECT * FROM system_schema.keyspaces""")
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(keyspace_name='system_auth', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '1')]))
Row(keyspace_name='system_schema', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.LocalStrategy')]))
Row(keyspace_name='system_distributed', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')]))
Row(keyspace_name='system', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.LocalStrategy')]))
Row(keyspace_name='system_traces', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '2')]))
Row(keyspace_name='university', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('repl

In [3]:
try:
    session.set_keyspace('university')
except Exception as e:
    print(e)

**Step 2**: Create a column family that can store the following information of a student 
<ul><li>Student Roll Number</li>
<li>Student Name</li>
<li>Enrolled Course (Each student can have multiple courses)</li>
<li>Semester (E.g Spring 2022)</li>
<li>Percentage of Marks Obtained By Students in Each Enrolled Course</li>
<li>Grade Obtained By Students in Each Enrolled Course</li>
<li>GPA Obtained By Students in Each Enrolled Course</li>
<li>If a student is Fresh or Repeating the Course</li></ul>

In [4]:
query = """CREATE TABLE IF NOT EXISTS record (s_rn int, s_name text, e_course text, semester text, percentage float, grade text, gpa float, status text,
PRIMARY KEY (semester, percentage));"""

try:
    session.execute("drop table if exists record")
    session.execute(query)
except Exception as e:
    print(e)

**Step 3 and 4**
* Now insert data of 5 Distinct and none or 1 with repeatation for each student 
* Thier should be data of atleast 10 students

In [5]:
from cassandra.query import BatchStatement

data = [[1,'S-1','ML','Spring 2022', 90.9, 'A', 4.0, 'Fresh'],
        [2,'S-2','BD','Spring 2022', 80.8, 'B', 3.9, 'Fresh'],
        [3,'S-3','DL','Spring 2022', 70.7, 'C', 3.8, 'Fresh'],
        [4,'S-4','ML','Spring 2022', 60.6, 'D', 3.7, 'Fresh'],
        [5,'S-5','BD','Spring 2022', 50.5, 'E', 3.6, 'Fresh'],
        [1,'S-1','DL','Fall 2022', 50.5, 'E', 3.6, 'Repeating'],
        [2,'S-2','ML','Fall 2022', 60.6, 'D', 3.7, 'Repeating'],
        [3,'S-3','BD','Fall 2022', 70.7, 'C', 3.8, 'Repeating'],
        [4,'S-4','DL','Fall 2022', 80.8, 'B', 3.9, 'Repeating'],
        [5,'S-5','ML','Fall 2022', 90.9, 'A', 4, 'Repeating']] 

prepared = session.prepare("INSERT INTO record (s_rn, s_name, e_course, semester, percentage, grade, gpa, status) VALUES (?,?,?,?,?,?,?,?)")
try:
    batch = BatchStatement()
    for i in range(len(data)):
        batch.add(prepared, (data[i][0], data[i][1], data[i][2], data[i][3], data[i][4], data[i][5], data[i][6], data[i][7]))
    
    session.execute(batch)
    rows = session.execute('SELECT * FROM record')
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Spring 2022', percentage=50.5, e_course='BD', gpa=3.5999999046325684, grade='E', s_name='S-5', s_rn=5, status='Fresh')
Row(semester='Spring 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-4', s_rn=4, status='Fresh')
Row(semester='Spring 2022', percentage=70.69999694824219, e_course='DL', gpa=3.799999952316284, grade='C', s_name='S-3', s_rn=3, status='Fresh')
Row(semester='Spring 2022', percentage=80.80000305175781, e_course='BD', gpa=3.9000000953674316, grade='B', s_name='S-2', s_rn=2, status='Fresh')
Row(semester='Spring 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-1', s_rn=1, status='Fresh')
Row(semester='Fall 2022', percentage=50.5, e_course='DL', gpa=3.5999999046325684, grade='E', s_name='S-1', s_rn=1, status='Repeating')
Row(semester='Fall 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-2', s_rn=2, status='Repeating')
Row(semester='Fal

**Step 5**: Now you have to explore following insights from data

* Display semester wise students in a sorted way. For example all student in Spring 2022 sorted on the basis of percentages they obtained.

In [6]:
query = "SELECT * FROM record WHERE semester = 'Spring 2022' ORDER BY percentage"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Spring 2022', percentage=50.5, e_course='BD', gpa=3.5999999046325684, grade='E', s_name='S-5', s_rn=5, status='Fresh')
Row(semester='Spring 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-4', s_rn=4, status='Fresh')
Row(semester='Spring 2022', percentage=70.69999694824219, e_course='DL', gpa=3.799999952316284, grade='C', s_name='S-3', s_rn=3, status='Fresh')
Row(semester='Spring 2022', percentage=80.80000305175781, e_course='BD', gpa=3.9000000953674316, grade='B', s_name='S-2', s_rn=2, status='Fresh')
Row(semester='Spring 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-1', s_rn=1, status='Fresh')


* Display only students that are enrolled in specific semester.

In [7]:
query = "SELECT * FROM record WHERE semester = 'Spring 2022'"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Spring 2022', percentage=50.5, e_course='BD', gpa=3.5999999046325684, grade='E', s_name='S-5', s_rn=5, status='Fresh')
Row(semester='Spring 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-4', s_rn=4, status='Fresh')
Row(semester='Spring 2022', percentage=70.69999694824219, e_course='DL', gpa=3.799999952316284, grade='C', s_name='S-3', s_rn=3, status='Fresh')
Row(semester='Spring 2022', percentage=80.80000305175781, e_course='BD', gpa=3.9000000953674316, grade='B', s_name='S-2', s_rn=2, status='Fresh')
Row(semester='Spring 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-1', s_rn=1, status='Fresh')


* Display only students that are enrolled in specific course

In [8]:
query = "SELECT * FROM record WHERE e_course = 'ML' AllOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Spring 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-4', s_rn=4, status='Fresh')
Row(semester='Spring 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-1', s_rn=1, status='Fresh')
Row(semester='Fall 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-2', s_rn=2, status='Repeating')
Row(semester='Fall 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-5', s_rn=5, status='Repeating')


* Students having grade 'A' in a specific course in a given semester. (For example I want to see who scored 'A' in Deep Learning)

In [9]:
query = "SELECT * FROM record WHERE semester = 'Spring 2022' and e_course ='ML' and grade = 'A' ALLOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Spring 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-1', s_rn=1, status='Fresh')


* Students who are repeating specific course in a given semester

In [10]:
query = "SELECT * FROM record WHERE semester = 'Fall 2022' and e_course ='ML' and status = 'Repeating' ALLOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(semester='Fall 2022', percentage=60.599998474121094, e_course='ML', gpa=3.700000047683716, grade='D', s_name='S-2', s_rn=2, status='Repeating')
Row(semester='Fall 2022', percentage=90.9000015258789, e_course='ML', gpa=4.0, grade='A', s_name='S-5', s_rn=5, status='Repeating')


* Total Number of students who are repeating specific course in a given semester

In [11]:
query = "SELECT count(*) FROM record WHERE semester = 'Fall 2022' and e_course ='ML' and status = 'Repeating' ALLOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(count=2)


* In which course a student obtained maximum or minimum percentange.

In [12]:
query = "SELECT MAX(percentage), MIN(PERCENTAGE) FROM record WHERE s_rn = 1 ALLOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(system_max_percentage=90.9000015258789, system_min_percentage=50.5)


* Can you calculate the CGPA of each student using the data you created.

We dont have the actual credit hours for each course. So, we wont be able to calculate CGPA. But, considering that CGPA is average of GPA in each course (all course have same  credit hourse) we can calculate cgpa for student as follows.

In [13]:
query = "SELECT AVG(gpa) FROM record WHERE s_rn = 1 ALLOW FILTERING"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row)

Row(system_avg_gpa=3.799999952316284)
