Requirements

1. Input each of the 12 monthly files
2. Create a 'file date' using the month found in the file name. The Null value should be replaced as 1
3. Clean the Market Cap value to ensure it is the true value as 'Market Capitalisation'. Remove any rows with 'n/a'
4. Categorise the Purchase Price into groupings: 0 to 24,999.99 as 'Low', 25,000 to 49,999.99 as 'Medium', 50,000 to 74,999.99 as 'High', 75,000 to 100,000 as 'Very High'
5. Categorise the Market Cap into groupings: Below 100M as 'Small', Between 100M and below 1B as 'Medium', Between 1B and below 100B as 'Large', 100B and above as 'Huge'
6. Rank the highest 5 purchases per combination of: file date, Purchase Price Categorisation and Market Capitalisation Categorisation.
7. Output only records with a rank of 1 to 5

In [2]:
import pandas as pd
import glob

1. Input the 12 monthly files
2. Create a 'file date' using the month found in the file name. The Null value should be replaced as 1

In [4]:
csv = glob.glob('Preppin Data Inputs/Input/*.csv')

In [5]:
df_list = []

for file in csv:
    df = pd.read_csv(file)
    df['file date'] = file
    df_list.append(df)

In [6]:
df_list

[       id first_name   last_name     Ticker            Sector  Market  \
 0       1      Hinze      Bartak      CYRXW    Transportation  NASDAQ   
 1       2      Lelia     Gresley       GILD       Health Care  NASDAQ   
 2       3  Leicester        Roff        NMZ               NaN    NYSE   
 3       4       Baxy     Fieller      USB^A           Finance    NYSE   
 4       5     Julita   Spradbery       INSI               NaN    NYSE   
 ..    ...        ...         ...        ...               ...     ...   
 995   996   Christel     Sarrell  CLNS^A.CL               NaN    NYSE   
 996   997       Rivi        Rame       EMCB               NaN  NASDAQ   
 997   998       Doti       Facer        SXT  Basic Industries    NYSE   
 998   999    Dorothy  Janauschek       KRNT     Capital Goods  NASDAQ   
 999  1000        Lin      Rosgen       GPAC       Health Care  NASDAQ   
 
                                           Stock Name Market Cap  \
 0                                     Cry

In [7]:
df = pd.concat(df_list, ignore_index=True)

In [8]:
def month(file_date):
    if '12' in file_date:
        return 'December'
    elif '11' in file_date:
        return 'November'
    elif '10' in file_date:
        return 'October'
    elif '9' in file_date:
        return 'September'
    elif '8' in file_date:
        return 'August'
    elif '7' in file_date:
        return 'July'
    elif '6' in file_date:
        return 'June'
    elif '5' in file_date:
        return 'May'
    elif '4' in file_date:
        return 'April'
    elif '3' in file_date:
        return 'March'
    elif '2' in file_date:
        return 'February'
    else:
        return 'January'

In [9]:
df['file date'] = df['file date'].apply(month)

In [10]:
df['file date'] = pd.to_datetime(df['file date'] + ' 2023',format='%B %Y')

In [11]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Cap,Purchase Price,file date
0,1,Hinze,Bartak,CYRXW,Transportation,NASDAQ,"CryoPort, Inc.",,$6947.28,2023-01-01
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",$83.79B,$77627.18,2023-01-01
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,$794.98M,$40676.73,2023-01-01
3,4,Baxy,Fieller,USB^A,Finance,NYSE,U.S. Bancorp,,$42441.50,2023-01-01
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,$211.52M,$41908.66,2023-01-01
...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",$174.42M,$94153.43,2023-08-01
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,$317.4M,$21159.77,2023-08-01
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,$68.19B,$28113.08,2023-08-01
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",$134.83M,$84451.49,2023-08-01


3. Clean the Market Cap value to ensure it is the true value as 'Market Capitalisation'. Remove any rows with 'n/a'

In [13]:
df = df.dropna(subset='Market Cap')

In [14]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Cap,Purchase Price,file date
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",$83.79B,$77627.18,2023-01-01
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,$794.98M,$40676.73,2023-01-01
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,$211.52M,$41908.66,2023-01-01
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,$175.48M,$163.84,2023-01-01
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,$213.68M,$55985.51,2023-01-01
...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",$174.42M,$94153.43,2023-08-01
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,$317.4M,$21159.77,2023-08-01
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,$68.19B,$28113.08,2023-08-01
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",$134.83M,$84451.49,2023-08-01


In [15]:
df.loc[:, 'Market Cap'] = df['Market Cap'].str.replace('$','')

In [16]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Cap,Purchase Price,file date
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",83.79B,$77627.18,2023-01-01
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,794.98M,$40676.73,2023-01-01
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,211.52M,$41908.66,2023-01-01
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,175.48M,$163.84,2023-01-01
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,213.68M,$55985.51,2023-01-01
...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",174.42M,$94153.43,2023-08-01
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,317.4M,$21159.77,2023-08-01
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,68.19B,$28113.08,2023-08-01
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",134.83M,$84451.49,2023-08-01


In [17]:
def true_value(x):
    if 'M' in x:
        y = x.replace('M','')
        return float(y) * 1000000
    else:
        y = x.replace('B','')
        return float(y) * 1000000000

In [18]:
df.loc[:, 'Market Cap'] = df['Market Cap'].apply(true_value)

In [19]:
df = df.rename(columns={'Market Cap':'Market Capitalisation'})

In [20]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",83790000000.0,$77627.18,2023-01-01
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,794980000.0,$40676.73,2023-01-01
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,211520000.0,$41908.66,2023-01-01
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,175480000.0,$163.84,2023-01-01
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,213680000.0,$55985.51,2023-01-01
...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",174420000.0,$94153.43,2023-08-01
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,317400000.0,$21159.77,2023-08-01
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,68190000000.0,$28113.08,2023-08-01
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",134830000.0,$84451.49,2023-08-01


4. Categorise the Purchase Price into groupings: 0 to 24,999.99 as 'Low', 25,000 to 49,999.99 as 'Medium', 50,000 to 74,999.99 as 'High', 75,000 to 100,000 as 'Very High'

In [22]:
def purchase_price_cat(x):

    y = x.replace('$','')

    y = float(y)
    
    if y >= 0 and y <= 24999.99:
        return 'Low'
    elif y >= 25000 and y <= 49999.99:
        return 'Medium'
    elif y >= 50000 and y <= 74999.99:
        return 'High'
    else:
        return 'Very High'

In [23]:
df['Purchase Price Category'] = df['Purchase Price'].apply(purchase_price_cat)

In [24]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date,Purchase Price Category
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",83790000000.0,$77627.18,2023-01-01,Very High
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,794980000.0,$40676.73,2023-01-01,Medium
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,211520000.0,$41908.66,2023-01-01,Medium
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,175480000.0,$163.84,2023-01-01,Low
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,213680000.0,$55985.51,2023-01-01,High
...,...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",174420000.0,$94153.43,2023-08-01,Very High
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,317400000.0,$21159.77,2023-08-01,Low
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,68190000000.0,$28113.08,2023-08-01,Medium
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",134830000.0,$84451.49,2023-08-01,Very High


5. Categorise the Market Cap into groupings: Below 100M as 'Small', Between 100M and below 1B as 'Medium', Between 1B and below 100B as 'Large', 100B and above as 'Huge'

In [26]:
def market_cap_cat(x):
    if x < 100000000:
        return 'Small'
    elif x >= 100000000 and x < 1000000000:
        return 'Medium'
    elif x >= 1000000000 and x < 100000000000:
        return 'Large'
    else:
        return ' Huge'

In [27]:
df['Market Capitalisation Category'] = df['Market Capitalisation'].apply(market_cap_cat)

In [28]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date,Purchase Price Category,Market Capitalisation Category
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",83790000000.0,$77627.18,2023-01-01,Very High,Large
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,794980000.0,$40676.73,2023-01-01,Medium,Medium
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,211520000.0,$41908.66,2023-01-01,Medium,Medium
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,175480000.0,$163.84,2023-01-01,Low,Medium
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,213680000.0,$55985.51,2023-01-01,High,Medium
...,...,...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",174420000.0,$94153.43,2023-08-01,Very High,Medium
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,317400000.0,$21159.77,2023-08-01,Low,Medium
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,68190000000.0,$28113.08,2023-08-01,Medium,Large
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",134830000.0,$84451.49,2023-08-01,Very High,Medium


6. Rank the highest 5 purchases per combination of: file date, Purchase Price Categorisation and Market Capitalisation Categorisation.

In [30]:
df['Purchase Price'] = df['Purchase Price'].str.replace('$','')

In [31]:
df['Purchase Price'] = df['Purchase Price'].astype(float)

In [32]:
df['rank'] = df.groupby(['file date','Purchase Price Category','Market Capitalisation Category'])['Purchase Price'].rank(method='dense',ascending=False)

In [33]:
df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date,Purchase Price Category,Market Capitalisation Category,rank
1,2,Lelia,Gresley,GILD,Health Care,NASDAQ,"Gilead Sciences, Inc.",83790000000.0,77627.18,2023-01-01,Very High,Large,90.0
2,3,Leicester,Roff,NMZ,,NYSE,Nuveen Municipal High Income Opportunity Fund,794980000.0,40676.73,2023-01-01,Medium,Medium,28.0
4,5,Julita,Spradbery,INSI,,NYSE,Insight Select Income Fund,211520000.0,41908.66,2023-01-01,Medium,Medium,25.0
5,6,Steven,McGrale,AP,Capital Goods,NYSE,Ampco-Pittsburgh Corporation,175480000.0,163.84,2023-01-01,Low,Medium,74.0
7,8,Elly,Dono,SCM,,NYSE,Stellus Capital Investment Corporation,213680000.0,55985.51,2023-01-01,High,Medium,62.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11995,996,Brant,Schwartz,BCBP,Finance,NASDAQ,"BCB Bancorp, Inc. (NJ)",174420000.0,94153.43,2023-08-01,Very High,Medium,18.0
11996,997,Terry,Leindecker,XIN,Basic Industries,NYSE,Xinyuan Real Estate Co Ltd,317400000.0,21159.77,2023-08-01,Low,Medium,8.0
11997,998,Zaccaria,Slatcher,ADBE,Technology,NASDAQ,Adobe Systems Incorporated,68190000000.0,28113.08,2023-08-01,Medium,Large,69.0
11998,999,Gayelord,Blenkhorn,AOI,Consumer Services,NYSE,"Alliance One International, Inc.",134830000.0,84451.49,2023-08-01,Very High,Medium,53.0


In [34]:
top_5_df = df[df['rank'] <= 5.0]

In [35]:
top_5_df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date,Purchase Price Category,Market Capitalisation Category,rank
13,14,Erminie,Lis,JHD,,NYSE,Nuveen High Income December 2019 Target Term Fund,277100000.0,24418.39,2023-01-01,Low,Medium,2.0
21,22,Davin,Rusling,LINK,Technology,NASDAQ,"Interlink Electronics, Inc.",60660000.0,23502.42,2023-01-01,Low,Small,4.0
53,54,Jany,Hancke,CSOD,Technology,NASDAQ,"Cornerstone OnDemand, Inc.",2090000000.0,74079.42,2023-01-01,High,Large,5.0
106,107,Chico,De Maria,CBA,,NYSE,ClearBridge American Energy MLP Fund Inc.,508590000.0,48507.84,2023-01-01,Medium,Medium,5.0
119,120,Nealson,Hosburn,VSTM,Health Care,NASDAQ,"Verastem, Inc.",74720000.0,74416.07,2023-01-01,High,Small,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11965,966,Denney,Manvell,IDE,,NYSE,"Voya Infrastructure, Industrials and Materials...",296890000.0,24036.38,2023-08-01,Low,Medium,4.0
11967,968,Cecilio,Itzkowicz,BSM,Energy,NYSE,"Black Stone Minerals, L.P.",3000000000.0,24381.90,2023-08-01,Low,Large,1.0
11972,973,Carry,Buff,ONCE,Health Care,NASDAQ,"Spark Therapeutics, Inc.",1820000000.0,98247.23,2023-08-01,Very High,Large,3.0
11973,974,Babs,Surcombe,IDRA,Health Care,NASDAQ,"Idera Pharmaceuticals, Inc.",237150000.0,72016.13,2023-08-01,High,Medium,4.0


In [36]:
top_5_df = top_5_df.sort_values(by=['Purchase Price Category','Market Capitalisation Category','file date','rank'])

In [37]:
top_5_df

Unnamed: 0,id,first_name,last_name,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,file date,Purchase Price Category,Market Capitalisation Category,rank
3794,795,Karon,Avramovitz,KHC,Consumer Non-Durables,NASDAQ,The Kraft Heinz Company,108870000000.0,70939.69,2023-02-01,High,Huge,1.0
3620,621,Vallie,Rebeiro,DIS,Consumer Services,NYSE,Walt Disney Company (The),165110000000.0,64358.89,2023-02-01,High,Huge,2.0
2711,712,Angelina,Lotte,AAPL,Technology,NASDAQ,Apple Inc.,741770000000.0,72661.97,2023-03-01,High,Huge,1.0
2161,162,Barnebas,Sapshed,SNY,Health Care,NYSE,Sanofi,123340000000.0,71475.09,2023-03-01,High,Huge,2.0
2412,413,Phebe,McKew,MSFT,Technology,NASDAQ,Microsoft Corporation,540440000000.000061,64599.58,2023-03-01,High,Huge,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5352,353,Welbie,Robbey,ORMP,Health Care,NASDAQ,Oramed Pharmaceuticals Inc.,98910000.0,99965.64,2023-12-01,Very High,Small,1.0
5117,118,Rice,Bentote,QAT,,NASDAQ,iShares MSCI Qatar Capped ETF,38960000.0,99349.08,2023-12-01,Very High,Small,2.0
5664,665,Juditha,Bengefield,APEN,Health Care,NASDAQ,"Apollo Endosurgery, Inc.",72330000.0,98709.71,2023-12-01,Very High,Small,3.0
5958,959,Thomasin,Edmed,PTIE,Health Care,NASDAQ,"Pain Therapeutics, Inc.",27030000.0,98696.95,2023-12-01,Very High,Small,4.0


In [38]:
top_5_df = top_5_df.iloc[:, [11,10,9,3,4,5,6,7,8,12]]

In [39]:
top_5_df

Unnamed: 0,Market Capitalisation Category,Purchase Price Category,file date,Ticker,Sector,Market,Stock Name,Market Capitalisation,Purchase Price,rank
3794,Huge,High,2023-02-01,KHC,Consumer Non-Durables,NASDAQ,The Kraft Heinz Company,108870000000.0,70939.69,1.0
3620,Huge,High,2023-02-01,DIS,Consumer Services,NYSE,Walt Disney Company (The),165110000000.0,64358.89,2.0
2711,Huge,High,2023-03-01,AAPL,Technology,NASDAQ,Apple Inc.,741770000000.0,72661.97,1.0
2161,Huge,High,2023-03-01,SNY,Health Care,NYSE,Sanofi,123340000000.0,71475.09,2.0
2412,Huge,High,2023-03-01,MSFT,Technology,NASDAQ,Microsoft Corporation,540440000000.000061,64599.58,3.0
...,...,...,...,...,...,...,...,...,...,...
5352,Small,Very High,2023-12-01,ORMP,Health Care,NASDAQ,Oramed Pharmaceuticals Inc.,98910000.0,99965.64,1.0
5117,Small,Very High,2023-12-01,QAT,,NASDAQ,iShares MSCI Qatar Capped ETF,38960000.0,99349.08,2.0
5664,Small,Very High,2023-12-01,APEN,Health Care,NASDAQ,"Apollo Endosurgery, Inc.",72330000.0,98709.71,3.0
5958,Small,Very High,2023-12-01,PTIE,Health Care,NASDAQ,"Pain Therapeutics, Inc.",27030000.0,98696.95,4.0


7. Output only records with a rank of 1 to 5

In [41]:
top_5_df.to_csv('Preppin Data Outputs/pd2023wk8_output.csv', index=False)