In [None]:
"""Introduction

In this project, the focus is set on the exploration and analysis of a comprehensive dataset containing 
worldwide university rankings. This dataset, combining three distinct global university rankings, provides 
a variety of information, including the institution name, country, national rank, quality of education, 
alumni employment, quality of faculty, publications, influence, citations, broad impact, patents, score, and year.

Purpose

The primary goal of the project is to demonstrate the ability to implement and utilize Python 
programming skills to interact with, filter, and sort the data. Through this project, a better 
understanding of data manipulation and analysis using Python is achieved.

The project includes three main functions:

filter_universities(data): This function is designed to filter universities based on the user's 
    input for country, year, and minimum score. The resulting set of universities is written to a 
    new CSV file named 'filtered_universities.csv'. This interactive function caters to the user's specific 
    needs and provides precise information accordingly.
    
above_average_by_year(data): This function is created to evaluate the average university score for each 
    year and identify those universities with scores above the average in their respective year. The output 
    is written into an 'above_average.csv' file. This function provides insights into the universities 
    performing above average annually.
    
sorted_universities(data): The objective of this function is to sort universities based on their scores. 
    It further filters universities based on a specific country as per the user's input. The sorted data 
    is written into a 'sorted_universities.csv' file. This function aids in obtaining a sorted list of 
    universities from a specific country, providing an easy comparative analysis
    """

In [1]:
# Function to read the CSV file
def read_data(datast):
    lines = []
    with open(datast, encoding='utf-8') as file:
        # Read all lines from the file
        for i in file.readlines():
            # Remove any trailing whitespaces and split each line into a list of elements
            lines.append(i.strip().split(','))
    # Exclude the header from the dataset
    lines = lines[1:]
    return lines

# Create a list from the data in the csv file
dataset = read_data('cwurData.csv')
# Print the number of rows and the first row to check if the dataset was loaded correctly
print(len(dataset), dataset[0])

2200 ['1', '"Harvard University"', 'USA', '1', '7', '9', '1', '1', '1', '1', '', '5', '100', '2012']


In [7]:
# This function filters universities based on country, year and minimum score input by user.
# The resulting data is written to a new file 'filtered_universities.csv'.
# It first presents a list of available countries, years and score range for the user to choose from.
def filter_universities(data):

    # Extracts a set of unique countries present in the dataset.
    # For each row in data, the country (row[2]) is extracted, white spaces are stripped off
    # and then added to a set to ensure uniqueness.
    countries = list(set([row[2].strip('"') for row in data]))

    # Prints the available countries for user's selection.
    print("Available countries: ", '\n'.join(countries))

    # Initializes a variable 'country' to store user's input. The variable is initialized with an empty string.
    country = ''

    # Converts the list of countries to lowercase for accurate comparison with user's input.
    countries_to_check = [i.lower() for i in list(countries)]

    # Prompts the user to input a country until a valid country from the available options is provided.
    while country not in countries_to_check:
        country = input("Enter one of the available countries: ").lower()

    # Extracts a set of unique years present in the dataset.
    # For each row in data, the year (row[13]) is extracted and converted to an integer.
    years = list(set([int(row[13]) for row in data]))

    # Prints the available years for user's selection.
    print("Available years: ", ', '.join(map(str, years)))

    # Initializes a variable 'year' to store user's input. The variable is initialized with 0.
    year = 0

    # Prompts the user to input a year until a valid year from the available options is provided.
    while year not in years:
        year = int(input("Enter one of the available years: "))

    # Extracts all scores present in the dataset. For each row in data, the score (row[12]) is extracted and converted to a float.
    scores = [float(row[12]) for row in data]

    # Calculates the minimum and maximum score for user's guidance.
    min_score = min(scores)
    max_score = max(scores)

    # Prints the range of scores for user's selection.
    print(f"Score can range from {min_score} to {max_score}")

    # Initializes a variable 'score' to store user's input. The variable is initialized with -1.
    score = -1

    # Prompts the user to input a score within the range until a valid score is provided.
    while not(min_score <= score <= max_score):
        score = float(input("Enter a score within the range: "))

    # Filters the data based on user's input.
    # A university is selected if its country, year, and score match the user's input.
    filtered_data = [row for row in data if row[2].lower() == country and int(row[13]) == year and float(row[12]) >= score]

    # If no universities are found that match the criteria, a message is printed to inform the user.
    # Otherwise, the filtered data is written to a new CSV file.
    if not filtered_data:
        print("No universities found that match the criteria.")
    else:
        with open('filtered_universities.csv', 'w', encoding='utf-8') as f:
            for row in filtered_data:
                f.write(','.join(row) + '\n')
        print('file is ready!')

# Run function
filter_universities(dataset)

Available countries:  Germany
Uganda
Switzerland
Cyprus
Romania
Hong Kong
New Zealand
Japan
Australia
Singapore
Iceland
South Africa
Croatia
Hungary
Lebanon
Saudi Arabia
Bulgaria
France
Egypt
Czech Republic
Puerto Rico
Sweden
Greece
Lithuania
Brazil
Belgium
China
United Kingdom
Portugal
Slovak Republic
Taiwan
Austria
Ireland
Canada
Turkey
Chile
Russia
Finland
Netherlands
Mexico
Spain
Thailand
Malaysia
Argentina
Serbia
Uruguay
Norway
India
Estonia
Colombia
United Arab Emirates
Italy
Slovenia
Israel
USA
Poland
Denmark
South Korea
Iran
Enter one of the available countries: Germany
Available years:  2012, 2013, 2014, 2015
Enter one of the available years: 2014
Score can range from 43.36 to 100.0
Enter a score within the range: 50
file is ready!


In [8]:
# This function finds the average score of universities for each year and filters out those universities
# that have a score higher than the average for their respective year. The results are written into 'above_average.csv' file.
def above_average_by_year(data):
    # Extract a set of unique years present in the dataset. For each row in data, 
    # the year (row[13]) is extracted and converted to an integer.
    years = list(set([int(row[13]) for row in data]))
    
    # Initialize an empty dictionary to store the average score for each year.
    year_avg_score = {}
    # {2012: 46.7, 2013: 55, 2014: 55.6}
    # [2012, 2013, 2014, 2015]
    # Iterate over the unique years
    for year in years:
        # Extract scores for the current year and convert them to float.
        scores = [float(row[12]) for row in data if int(row[13]) == year]

        # Compute the average score for the current year.
        avg_score = sum(scores) / len(scores)
        
        # Store the average score in the dictionary using the year as the key.
        year_avg_score[year] = avg_score

    # Filter the data to keep only rows where the score (converted to float) is above 
    # the average score of its year.
    above_average_data = [row for row in data if float(row[12]) > year_avg_score[int(row[13])]]
    
    # Write the above average data to a new CSV file.
    with open('above_average.csv', 'w', encoding='utf-8') as f:
        for row in above_average_data:
            f.write(','.join(row) + '\n')
    print('file is ready!')

# Call the function
above_average_by_year(dataset)

file is ready!


In [9]:
# This function sorts universities based on score and writes to a new file 'sorted_universities.csv' 
# only those universities that belong to a specific country input by the user.
def sorted_universities(data):
    # Extract a set of unique countries present in the dataset. For each row in data, 
    # the country (row[2]) is extracted and added to a Python set to ensure uniqueness.
    countries = list(set([row[2].strip('"') for row in data]))
    
    # Print the available countries for user's selection.
    print("Available countries: ", '\n'.join(countries))
    
    # Initialize a variable 'country' to store user's input. The variable is initialized with an empty string.
    country = ''
    
    # Convert the list of countries to lowercase for accurate comparison with user's input.
    countries_to_check = [i.lower() for i in list(countries)]
    
    # Prompt the user to input a country until a valid country from the available options is provided.
    while country not in countries_to_check:
        country = input("Enter one of the available countries: ").lower()

    # Sort the data by score (converted to float) in ascending order.
    # Only rows where the country matches the user's input are included.
    sorted_data = sorted([row for row in data if row[2].lower() == country], key=lambda x: float(x[12]))
    # sorted(data, key=)
    def func(x):
        return float(x[12])
    # анонимная функция, то есть функция, не имеющая имени, но по сути работающая как обычная функция
    # [[Harvard, 3,], [oxford, 3], [cambridge, 5]]

    # Write the sorted data to a new CSV file.
    with open('sorted_universities.csv', 'w', encoding='utf-8') as f:
        for row in sorted_data:
            f.write(','.join(row) + '\n')
    print('file is ready!')

# Call the function with dataset as the argument
sorted_universities(dataset)          

Available countries:  Germany
Uganda
Switzerland
Cyprus
Romania
Hong Kong
New Zealand
Japan
Australia
Singapore
Iceland
South Africa
Croatia
Hungary
Lebanon
Saudi Arabia
Bulgaria
France
Egypt
Czech Republic
Puerto Rico
Sweden
Greece
Lithuania
Brazil
Belgium
China
United Kingdom
Portugal
Slovak Republic
Taiwan
Austria
Ireland
Canada
Turkey
Chile
Russia
Finland
Netherlands
Mexico
Spain
Thailand
Malaysia
Argentina
Serbia
Uruguay
Norway
India
Estonia
Colombia
United Arab Emirates
Italy
Slovenia
Israel
USA
Poland
Denmark
South Korea
Iran
Enter one of the available countries: USA
file is ready!
