Clean the pension fund dataset!
The dataset you're about to work with contains an excerpt of a much larger dataset with all the Danish pension funds' investments. It is provided as a CSV file with the following columns: name of the pension fund, name of the company, invested amount. The separator of the CSV file is the semicolon (;).

Your task is to clean and format the data according to the guidelines below and then write it to a new (!) CSV file.

Please note that the only module you will have to import is Python's built-in CSV module. Some of the tasks can be solved using other modules, but this is totally optional, and you will most probably be able to solve the tasks faster by implementing your own function instead of searching for a corresponding one (be aware that this only counts for this exercise - in other circumstances it may be much better to use existing modules!).

In this exercise, you should focus on breaking the code into several helper functions. Work on making each of the helper functions return the desired output, which in turn involves looking at what exactly you provide as input to the function.

Complete the following tasks - but think about the order in which you do them first!

- Remove any wrong or odd row entries.
- Read the file into memory.
- All the columns with the company names begin with 'company_name:'. Remove this, so that the entry only contains the company's name.
- Write the nice and clean data to another CSV file.
- In the raw file, the invested sums are formatted in different ways. AkademikerPension is formatted as decimal numbers, and Industriens Pension is in million DKK (e. g. 130 means 130000000). Only PenSam and Velliv are already formatted correctly. All of the sums have to be formatted as non-decimal numbers and as a complete number, e.g. if the investment is 5.9 million DKK, the entry should be 5900000 and nothing else.
For the tasks involving string manipulation, you can find help here: https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/14-Strings-and-Regular-Expressions.ipynb

If you are done with the tasks above, please do the following:

Create a dictionary with the name of the pension fund as the key, and a list of lists as the value of each fund. The list should contain the largest invested sum in a single company and the median investment. It should be in the following format: [[company name (str), invested sum (int)], [company name (str), invested sum (int)]] with the entry at index 0 being the company where the corresponding pension fund has invested the largest amount of money.
Make sure all your helper functions contain docstrings according to PEP8.

In [6]:
import csv

file_path = "C:\\Users\\loyda\\Documents\\Python Scripts\\pension_fund_data.csv"
#file_path = "sample_Data/pension_fund_data.csv"

with open(file_path) as file:
    reader = csv.reader(file, delimiter = ';')
    header = next(reader)
    fund_data = [row for row in reader]
    
print(header)
print(fund_data)
        

['pension_fund', 'company ', 'invested_amount']
[['AkademikerPension', 'company_name: ANDRITZ AG', '9035889.67000961'], ['AkademikerPension', 'company_name: Verbund AG', '535484.04264'], ['AkademikerPension', 'company_name: Wienerberger AG', '489278.539054215'], ['AkademikerPension', 'company_name: ams AG', '9582835.98899249'], ['AkademikerPension', 'company_name: Oesterreichische Post AG', '283768.092184449'], ['AkademikerPension', 'company_name: Pendal Group Ltd', '1279011.44099595'], ['AkademikerPension', 'company_name: Asaleo Care Ltd', '311739.24'], ['AkademikerPension', 'company_name: Aristocrat Leisure Ltd', '159265.740991766'], ['AkademikerPension', 'company_name: ALS Ltd', '10314327.76'], ['AkademikerPension', 'company_name: Altium Ltd', '14675679.27'], ['AkademikerPension', 'company_name: AMP Ltd', '210881.09'], ['AkademikerPension', 'company_name: AUST AND NZ BANKING GROUP', '15375.58598142'], ['AkademikerPension', 'company_name: Australia & New Zealand Banking Group Ltd', '

In [13]:
for row in fund_data:
        if row[0] == 'Nofund':
            remove_entry = row
            
print(remove_entry)

#fund_data.remove(remove_entry)

['Nofund', 'company_name: JUST REMOVE ME, Iâ€™M DIRTY DATA!', '9999999']


In [16]:
def remove_company_name(company):
    return company.replace('company_name: ', '')

In [20]:
for row in fund_data:
    #print(remove_company_name(row[1]))
    print(row[1])

company_name: ANDRITZ AG
company_name: Verbund AG
company_name: Wienerberger AG
company_name: ams AG
company_name: Oesterreichische Post AG
company_name: Pendal Group Ltd
company_name: Asaleo Care Ltd
company_name: Aristocrat Leisure Ltd
company_name: ALS Ltd
company_name: Altium Ltd
company_name: AMP Ltd
company_name: AUST AND NZ BANKING GROUP
company_name: Australia & New Zealand Banking Group Ltd
company_name: Afterpay Ltd
company_name: ARB Corp Ltd
company_name: Aurizon Holdings Ltd
company_name: BHP Group Ltd
company_name: BlueScope Steel Ltd
company_name: Brambles Ltd
company_name: carsales.com Ltd
company_name: Challenger Ltd
company_name: Sungard AS New Holdings III LLC
company_name: Superior Energy Services Inc
company_name: a2 Milk Co Ltd/The
company_name: Fisher & Paykel Healthcare Corp Ltd
company_name: Fletcher Building Ltd
company_name: SKY Network Television Ltd
company_name: Xero Ltd
company_name: Copa Holdings SA
company_name: Ayala Land Inc
company_name: BANK OF THE P

company_name: Commonwealth Bank of Australia
company_name: Computershare Ltd
company_name: CSL Ltd
company_name: Dexus
company_name: Evolution Mining Ltd
company_name: Fortescue Metals Group Ltd
company_name: Goodman Group
company_name: GPT Group/The
company_name: Insurance Australia Group Ltd
company_name: Lendlease Corp Ltd
company_name: Magellan Financial Group Ltd
company_name: Medibank Pvt Ltd
company_name: Mirvac Group
company_name: National Australia Bank Ltd
company_name: Newcrest Mining Ltd
company_name: Northern Star Resources Ltd
company_name: Oil Search Ltd
company_name: QBE Insurance Group Ltd
company_name: Ramsay Health Care Ltd
company_name: REA Group Ltd
company_name: Rio Tinto Ltd
company_name: Santos Ltd
company_name: Scentre Group
company_name: SEEK Ltd
company_name: Sonic Healthcare Ltd
company_name: Stockland
company_name: Suncorp Group Ltd
company_name: Sydney Airport
company_name: Tabcorp Holdings Ltd
company_name: Telstra Corp Ltd
company_name: Transurban Group


company_name: KOSHIDAKA HOLDINGS CO LTD
company_name: OKUWA CO LTD
company_name: MEDARTIS HOLDING AG
company_name: MTI LTD
company_name: SANSHIN ELECTRONICS CO LTD
company_name: AUSTRALIAN AGRICULTURAL CO
company_name: MATSUDA SANGYO CO LTD
company_name: FUJIYA CO LTD
company_name: RIKEN KEIKI CO LTD
company_name: HT&E LTD
company_name: AKOUOS INC
company_name: SOFTWARE SERVICE INC
company_name: GRUPO EMPRESARIAL SAN JOSE
company_name: MARVELOUS INC
company_name: DUSKIN CO LTD
company_name: PROMOTORA DE INFORMACIONES-A
company_name: LIBERTY MEDIA CORP-BRAVES A
company_name: DAIWA HOUSE REIT INVESTMENT
company_name: VT HOLDINGS CO LTD
company_name: ICHIKOH INDUSTRIES LTD
company_name: AGRANA BETEILIGUNGS AG
company_name: TONAMI HOLDINGS CO LTD
company_name: KINTETSU DEPT STORE CO LTD
company_name: WORLD HOLDINGS CO LTD
company_name: HEMISPHERE MEDIA GROUP INC
company_name: CONEXIO CORP
company_name: WDB HOLDINGS CO LTD
company_name: DAIICHI JITSUGYO CO LTD
company_name: ARNOLDO MONDADOR

In [23]:
def format_akademikerpension(amount):
    return str(round(float(amount)))

In [30]:
def format_industrien_pension(amount):
    to_float = float(amount)
    multiply = to_float * 1000000
    to_int = int(multiply)
    return str(to_int)

In [31]:
output_list = []

for row in fund_data:
    #print(row)
    new_row = [row[0], remove_company_name(row[1])]
    
    if row[0] == 'AkademikerPension':
        new_row.append(format_akademikerpension(row[2]))
    elif row[0] == 'Industriens Pension':
        new_row.append(format_industrien_pension(row[2]))
    else:
        new_row.append(row[2])
    print(new_row)
    output_list.append(new_row)

['AkademikerPension', 'ANDRITZ AG', '9035890']
['AkademikerPension', 'Verbund AG', '535484']
['AkademikerPension', 'Wienerberger AG', '489279']
['AkademikerPension', 'ams AG', '9582836']
['AkademikerPension', 'Oesterreichische Post AG', '283768']
['AkademikerPension', 'Pendal Group Ltd', '1279011']
['AkademikerPension', 'Asaleo Care Ltd', '311739']
['AkademikerPension', 'Aristocrat Leisure Ltd', '159266']
['AkademikerPension', 'ALS Ltd', '10314328']
['AkademikerPension', 'Altium Ltd', '14675679']
['AkademikerPension', 'AMP Ltd', '210881']
['AkademikerPension', 'AUST AND NZ BANKING GROUP', '15376']
['AkademikerPension', 'Australia & New Zealand Banking Group Ltd', '723353']
['AkademikerPension', 'Afterpay Ltd', '52173247']
['AkademikerPension', 'ARB Corp Ltd', '1264033']
['AkademikerPension', 'Aurizon Holdings Ltd', '464861']
['AkademikerPension', 'BHP Group Ltd', '864932']
['AkademikerPension', 'BlueScope Steel Ltd', '917445']
['AkademikerPension', 'Brambles Ltd', '27064700']
['Akademi

['Industriens Pension', 'AIRPORTS OF THAILAND PCL-FOR', '8000000']
['Industriens Pension', 'PING AN INSURANCE GROUP CO-A', '8000000']
['Industriens Pension', 'ZHEJIANG SANHUA INTELLIGEN-A', '8000000']
['Industriens Pension', 'PETROCHINA CO LTD-A', '8000000']
['Industriens Pension', 'SHENZHEN INOVANCE TECHNOLO-A', '8000000']
['Industriens Pension', 'SIAM CEMENT PUB CO-FOR REG', '7000000']
['Industriens Pension', 'BEIJING ORIENTAL YUHONG-A', '7000000']
['Industriens Pension', 'BYD CO LTD -A', '7000000']
['Industriens Pension', 'NANJING IRON & STEEL CO-A', '7000000']
['Industriens Pension', 'SHANGHAI PHARMACEUTICALS-A', '6000000']
['Industriens Pension', 'NAURA TECHNOLOGY GROUP CO-A', '6000000']
['Industriens Pension', 'BAIDU INC - SPON ADR', '6000000']
['Industriens Pension', 'INNER MONGOLIA YILI INDUS-A', '6000000']
['Industriens Pension', 'HON HAI PRECISION INDUSTRY', '6000000']
['Industriens Pension', 'SANY HEAVY INDUSTRY CO LTD-A', '6000000']
['Industriens Pension', 'ZHEJIANG WEIXING

In [33]:
outputfile = "C:\\Users\\loyda\\Documents\\Python Scripts\\cleaned_file.csv"

with open(outputfile, 'w') as file:
    writer = csv.writer(file)
    writer.writerows(output_list)