# **Extract and Process Form Data:**
The process of extracting and processing form data is crucial for matching students with their preferred startups. The proposed approach involves utilizing a form to collect the company preferences of students. However, to effectively use this information, it must be processed and converted into a format that can be interpreted by our matching algorithm. The code snippet presented below demonstrates how a sample of form output (newform.csv) is transformed into a map.

**Expected output:**

```
{
   "Mazhar":[
      "MarsCharge",
      "JelikaLite",
      "iQure Pharma",
      "Lumos",
      "ecolytiq",
      "Datanchor",
      "Droice Labs",
      "Chimera Therapuetics"
   ],
   "Farhan":[
      "Seek AI",
      "Sachi Bioworks",
      "SageMedic",
      "ROCSOLE",
      "Datanchor",
      "Cortina Health",
      "Droice Labs",
      "ChargeWheel"
   ]
}
```




In [None]:
import csv
from tabulate import tabulate

# **Reading CSV file**
This code reads data from a CSV file and returns it as an array. The CSV file path is specified by the variable file_path. The returned array is assigned to the variable data.

In [None]:
# Example usage
file_path = 'newform.csv'

def read_csv(file_path):
    data_array = []
    with open(file_path, 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
        for row in csv_reader:
            data_array.append(row)
    return data_array


data = read_csv(file_path)

# **Clean the data**
Here we remove the coloumns and rows that we do not require.

In [None]:
# remove unnecessary rows (first and third row)
del data[2]
del data[0]

# remove the first 17 columns from each row
data = [row[17:] for row in data]

# **Creating Preference Dictionary:**


* The first line creates a list of student names.
* The extract_header function takes a string argument and returns the second element (stripped of whitespace) after splitting the string on the hyphen. Eg: Click to write the question text - Life Sciences is taken as input and Life Sciences is returned.
* The create_preference_list function creates a dictionary that has the rank of preference as the key and the company names as the value.

In [None]:
# create a list of student names from the first column of the data_array
student_names = [row[0] for row in data[1:]]

def extract_header(string):
    # Split the string on the hyphen and get the second element
    extracted_header = string.split('-')[1].strip()

    # Return the extracted text
    return extracted_header

def create_preference_list(data_array, start_col, end_col):
    preference_list = []

    for i in range(1, len(data_array)):  # skip header row
        pref_dict = dict()

        for j in range(start_col, end_col):
            if data_array[i][j]:
                pref_dict[data_array[i][j]] = extract_header(data_array[0][j])

        preference_list.append(pref_dict)

    return preference_list

track_pref_col_start = 2
track_pref_col_end = 5
student_track_pref_list = create_preference_list(data, track_pref_col_start, track_pref_col_end)

deep_tech_pref_col_start = 6
deep_tech_pref_col_end = 27
student_deep_tech_pref_list = create_preference_list(data, deep_tech_pref_col_start, deep_tech_pref_col_end)

digital_tech_pref_col_start = 28
digital_tech_pref_col_end = 49
student_digital_tech_pref_list = create_preference_list(data, digital_tech_pref_col_start, digital_tech_pref_col_end)

life_sciences_pref_col_start = 50
life_sciences_pref_col_end = 66
student_life_sciences_pref_list = create_preference_list(data, life_sciences_pref_col_start, life_sciences_pref_col_end)

# **Appending Preferences:**

We create a nested dictionary called result, where each key is a student name and each value is a list of their preferred tracks (which is derived from their corresponding preferences for each track).

The outermost loop iterates over each student in student_names. For each student, the second outermost for loop iterates over their preferred tracks (retrieved from student_track_pref_list[i]). For each track, the code checks if it is 'Digital Tech', 'Deep Tech', or 'Life Sciences'. Depending on the track, it appends the corresponding preference list in the inner most for loop but in increasing order of preference ( since we had used the index of preference as the key we can simply loop through and fetch the values in the track preferences lists' to get the sorted preference list.

Once all of the student preferences have been collected in pref_list, the code updates the result dictionary with a key-value pair where the key is the student name and the value is their pref_list.

In [None]:
result = {}

for i, student in enumerate(student_names):
    pref_list = []

    for j in range(len(student_track_pref_list[i])):

        if student_track_pref_list[i].get(str(j + 1)) == 'Digital Tech':
            for k in range(len(student_digital_tech_pref_list[i])):
                pref_list.append(student_digital_tech_pref_list[i].get(str(k + 1)))

        if student_track_pref_list[i].get(str(j + 1)) == 'Deep Tech':
            for k in range(len(student_deep_tech_pref_list[i])):
                pref_list.append(student_deep_tech_pref_list[i].get(str(k + 1)))

        if student_track_pref_list[i].get(str(j + 1)) == 'Life Sciences':
            for k in range(len(student_life_sciences_pref_list[i])):
                pref_list.append(student_life_sciences_pref_list[i].get(str(k + 1)))

    result.update({student: pref_list})

print(result)

# print the updated 2D array as a table
print(tabulate(data, headers="firstrow", tablefmt="grid"))

{'Mazhar': ['MarsCharge', 'JelikaLite', 'iQure Pharma', 'Lumos', 'ecolytiq', 'Datanchor', 'Droice Labs', 'Chimera Therapuetics'], 'Farhan': ['Seek AI', 'Sachi Bioworks', 'SageMedic', 'ROCSOLE', 'Datanchor', 'Cortina Health', 'Droice Labs', 'ChargeWheel']}
+--------+--------------------------------+------------------------------------------------+---------------------------------------------------+----------------------------------------------------+-----------------------------------------------------+----------------------------------------------+-------------------------------------------------------+------------------------------------------+-------------------------------------------------+------------------------------------------------+--------------------------------------------+----------------------------------------------+-------------------------------------------+------------------------------------------------------+-------------------------------------------+-------------