Project 2: object-oriented
## How important are academic achievements for later success?

In this project, I revisit the same research question as in my first assignment: **“How important are academic achievements for later success?”**. This time I implement the analysis using an **object-oriented programming** approach instead of a procedural one. The goal is to gain a deeper understanding of how and when to apply object-oriented programming while analyzing the same dataset and research question.

In [1]:
from typing import Optional
from collections import Counter

### Creating a Data Class to model the dataset entries 

In [2]:
from dataclasses import dataclass

@dataclass
class Person:
    name: str
    profession: str
    degree: str
    field: str
    institution: str
    graduation_year: Optional[int] 
    country: str
    university_global_ranking: Optional[int]
    gpa: float
    scholarship_award: str

### Reading the CSVfile

In [3]:
import csv

class CSVReader:
    def read(self, file_path: str) -> list[Person]:
        persons = []
        with open(file_path) as csvdatei:
            csv_reader_object = csv.reader(csvdatei)
            for index, row in enumerate(csv_reader_object):
                if (index == 0):
                    continue
                person = Person(
                    name=row[0],
                    profession=row[1],
                    degree=row[2],
                    field=row[3],
                    institution=row[4],
                    graduation_year=self.parse_int(row[5]),
                    country=row[6],
                    university_global_ranking=self.parse_int(row[7]),
                    gpa=self.parse_float(row[8]),
                    scholarship_award=row[9]
                )
                persons.append(person)
        return persons
    
    def parse_float(self, value: str) -> float:
        try:
            return float(value)
        except:
            return float('nan')
        
    def parse_int(self, value: str) -> Optional[int]:
        try:
            return int(value)
        except:
            return None
        
reader = CSVReader()
persons = reader.read('./data/successful_educations.csv') 
persons              


[Person(name='Elon Musk', profession='Entrepreneur', degree='Bachelor of Science', field='Physics & Economics', institution='University of Pennsylvania', graduation_year=1997, country='USA', university_global_ranking=13, gpa=3.7, scholarship_award='Dean’s List'),
 Person(name='Bill Gates', profession='Entrepreneur', degree='Dropped Out', field='computer science', institution='Harvard University', graduation_year=1975, country='USA', university_global_ranking=5, gpa=nan, scholarship_award='Dean’s List'),
 Person(name='Sundar Pichai', profession='Tech Executive', degree='Master of Science', field='Material Sciences and Engineering', institution='Stanford University', graduation_year=1995, country='USA', university_global_ranking=3, gpa=3.8, scholarship_award='Fellowship in Engineering'),
 Person(name='Sheryl Sandberg', profession='Tech Executive', degree='Master of Business Administration', field='Business Administration', institution='Harvard Business School', graduation_year=1995, coun

### Creating a class to represent the DataFrame + adding methods for data analysis

In [4]:
class Dataframe():
    def __init__(self, persons: list[Person]):
        self.persons = persons

    def average_gpa(self) -> float:
        gpas = [p.gpa for p in self.persons if not self.is_nan(p.gpa)]
        return sum(gpas) / len(gpas) if gpas else 0.0

    def is_nan(self, value) -> bool:
        return value != value  
    
    def no_degree_share(self) -> float:
        counter = 0
        for person in self.persons:
            if person.degree =='No Degree' or person.degree =='Dropout' or person.degree == 'Dropped Out':
                counter = counter + 1
        return counter/len(persons)        

    def degree_distribution(self) -> dict:
        bachelor_labels = {
            "Bachelor of Science", "Undergraduate Studies", "Bachelor of Arts", "Bachelor", "Engineering"
        }
        master_labels = {
            "Master of Science", "Master of Business Administration", "Master of Arts", "MBA", "Master/MBA", "Master"
        }
        doctor_labels = {
            "Juris Doctor", "PhD", "MD", "Doctor of Philosophy"
        }
        no_degree_labels = {
            "Dropout", "No Degree", "Dropped Out"
        }

        distribution = {
            "Bachelor": 0,
            "Master": 0,
            "Doctor": 0,
            "No Degree": 0,
            "Other": 0
    
        }

        for person in self.persons:
            degree = person.degree.strip()

            if degree in bachelor_labels:
                distribution["Bachelor"] += 1
            elif degree in master_labels:
                distribution["Master"] += 1
            elif degree in doctor_labels:
                distribution["Doctor"] += 1
            elif degree in no_degree_labels:
                distribution["No Degree"] += 1
            else:
                distribution["Other"] += 1
    
        
        distribution["Bachelor"] /= len(self.persons)
        distribution["Master"] /= len(self.persons)
        distribution["Doctor"] /= len(self.persons)
        distribution["No Degree"] /= len(self.persons)
        distribution["Other"] /= len(self.persons)
        
        return distribution
    
    
    def top_university_share(self, top_n: int = 20) -> float:
        top_ranked = 0
        total_ranked = 0

        for person in self.persons:
            total_ranked += 1
            if person.university_global_ranking is not None and person.university_global_ranking <= top_n:
                top_ranked += 1

        if total_ranked == 0:
            return 0.0  
        return top_ranked / total_ranked
    
    def field_distribution(self, top_n=10):
        fields = [p.field.strip().lower() for p in self.persons if p.field]
        return Counter(fields).most_common(top_n)
  
    def scholarship_share(self) -> float:
        total = len(self.persons)
        if total == 0:
            return 0.0

        with_scholarship = sum(
            1 for person in self.persons
            if person.scholarship_award.strip().lower() not in {"n/a"}
        )

        return round((with_scholarship / total) * 100, 2)

    def top_institutions(self, top_n=3):
        institutions = [p.institution.strip() for p in self.persons if p.institution]
        return Counter(institutions).most_common(top_n)


In [5]:
df = Dataframe(persons)

### 1. Question: Do people with high GPAs become more successful?

In [6]:

print("Average GPA:", df.average_gpa())


Average GPA: 3.771764705882353


In [7]:
print("Percent without a degree:", df.no_degree_share())

Percent without a degree: 0.07407407407407407


#### First conclusion
The average GPA for university students in the U.S. is approximately 3.1. In contrast, the average GPA of the successful individuals in my dataset is around 3.8. This suggests that a high GPA can play an important role in achieving career success.
However, there is still 7.4%  who achieved success without completing a university degree, which shows that academic excellence, while helpful, is not a strict requirement for success.

### 2. Question: What kind of academic thesis is most common among successful individuals?

In [8]:
print("Degree Distribution", df.degree_distribution())

Degree Distribution {'Bachelor': 0.46296296296296297, 'Master': 0.28703703703703703, 'Doctor': 0.12962962962962962, 'No Degree': 0.07407407407407407, 'Other': 0.046296296296296294}


### Second Conclusion
In my dataset, nearly 50% of the successful individuals hold a Bachelor degree, 29% have earned a Master and 13% hold a doctoral degree. Also 7% of them have no academic degree at all.
According to the [Census Bureau Educational Attainment Data](https://www.census.gov/newsroom/press-releases/2023/educational-attainment-data.html), in the U.S. (2022), 23% of the population aged 25 and older hold a Bachelor degree, 14% have earned a Master or Doctorial degree. The remaining 63% have no university degree at all.


Compared to the general U.S. population, successful individuals in my dataset are far more likely to hold a university degree - especially at the Master's and Doctoral level. This suggests that while a degree may not be strictly necessary for success, higher education is much more common among successful persons than in the general population.

### 3. Are people who attend top-ranked universities more likely to become successful?

In [9]:
print("Top 20 university share:", df.top_university_share())

Top 20 university share: 0.4537037037037037


### Third conclusion
 Among the successful individuals in my dataset, 45% attended one of the top 20 universities worldwide. 
 Considering the total number of students worldwide and the limited capacity of the top 20 universities, this percentage is quite significant.Therefore, it can be concluded that people who attended top-ranked universities are more likely to become successful. 
 The bar chart also shows which universities most of these individuals attended, and it is noticeable that all of these are well-known university names.

### 4. Question: Is the field of study important for success?

In [10]:
print("Most common field_distribution:", df.field_distribution())

Most common field_distribution: [('computer science', 13), ('economics', 11), ('business', 9), ('engineering', 9), ('business administration', 4), ('law', 4), ('medicine', 4), ('english', 3), ('n/a', 3), ('management', 2)]


### Fourth conclusion
The most common fields among successful individuals are Computer Science, Economics, Business, and Engineering. This suggests that certain areas of study, particularly those related to technology, business, and economics, may have a stronger connection to later success. However, a variety of other fields such as Law, Medicine, and English are also represented, indicating that success can come from diverse academic backgrounds.

### 5. Question: How important are scholorships/awards for success?

In [11]:
print("scholarship_share:", df.scholarship_share())

scholarship_share: 74.07


### Fifth  conclusion
About 74% of the successful individuals in the dataset have received scholarships or awards. This high percentage indicates that scholarships and awards are common among successful people and may play a significant role in supporting their achievements. 

### 6. Question: Which university produces the most successful individuals? 

In [12]:
print("top_3_institutions:", df.top_institutions())

top_3_institutions: [('Harvard University', 10), ('Stanford University', 9), ('UCLA', 4)]


### Sixth conclusion
The top three universities producing the most successful individuals in the dataset are Harvard University, Stanford University, and UCLA. This suggests that attending prestigious institutions, especially those with strong reputations like Harvard and Stanford, may increase the likelihood of later success. However, success is not limited to just the very top universities, as shown by UCLA’s presence in the top three.

## Final conclusion
#### How important are academic achievements for later success?

Academic achievement appears to play an important role in achieving career success. Most individuals in the dataset hold university degrees and the average GPA is significantly above the U.S. average. Nearly half of them attended one of the top 20 universities worldwide, suggesting that elite institutions may offer an advantage on the path to success.

In conclusion, academic achievements — such as earning a degree, maintaining a strong GPA, attending prestigious universities and getting a scholarship/awards — are common among successful individuals and can certainly support a successful career. However, success can take many forms, and not everyone in the dataset followed the same path. Some achieved success without formal degrees.

It is interesting to note that certain areas of study, particularly those related to technology, business, and economics, may have a stronger connection to later success. Additionally, the top three most attended universities among successful individuals are Harvard, Stanford, and UCLA, highlighting the influence of prestigious institutions on career outcomes.
 
 We also have to consider that success is hard to measure exactly, and grading systems or university rankings can vary between countries. Additionally, the dataset contains only 108 rows (after cleaning even less than 108), and it is biased because it includes only successful people.