# Add Fossil Classification for a given Holding File

In [1]:
import os
os.chdir('..')

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from fossil_classification import *
from enrich_holdings import *

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.options.display.float_format = "{:,.2f}".format

# Classify quarterly holdings file

In [4]:
holdings_path = "data/downloaded reports/company reports 2022Q1/holdings_for_cls.csv"
classify_holdings(holdings_path)


1. Preparing holding file

** Holdings file for classification **
data/downloaded reports/company reports 2022Q1/holdings_for_cls.csv
columns: Index(['שם המנפיק/שם נייר ערך', 'מספר ני"ע', 'מספר מנפיק', 'שווי',
       'שעור מנכסי אפיק ההשקעה', 'שעור מסך נכסי השקעה', 'holding_type',
       'זירת מסחר', 'תאריך רכישה', 'ערך נקוב', 'שער', 'שעור מערך נקוב מונפק',
       'ענף מסחר', 'SystemName', 'ParentCorpName', 'ReportPeriodDesc'],
      dtype='object')


  return func(self, *args, **kwargs)



Holding file ISIN col is: מספר ני"ע
number of ISINs: 8153 out of 27992 rows

Holding file Israel Corp col is: מספר מנפיק
number of Israel Corp Numbers: 18811 out of 27992 rows

2. Preparing mapping files

3. Enriching holding file

Holding file ISIN col is: מספר ני"ע
number of ISINs: 8181 out of 27992 rows

Holding file מספר תאגיד col is: מספר מנפיק
number of מספר תאגידs: 18811 out of 27992 rows

no LEIs in holdings file
מספר ני"עs with matching ISIN: 23744 out of total relevant rows: 19811
מספר תאגידs with matching מספר מנפיק: 19922 out of total relevant rows: 18811
מספר ני"עs with matching מספר מנפיק: 19948 out of total relevant rows: 19811
ISINs with matching מספר מנפיק: 20001 out of total relevant rows: 23744
ISINs with matching LEI: 4673 out of total relevant rows: 23744

4. Preparing previously classified file


  return func(self, *args, **kwargs)



Holding file ISIN col is: ISIN
number of ISINs: 115784 out of 138148 rows

Holding file מספר תאגיד col is: מספר תאגיד
number of מספר תאגידs: 87288 out of 138148 rows

Holding file LEI col is: LEI
number of LEIs: 25861 out of 138148 rows
מספר ני"עs with matching ISIN: 117263 out of total relevant rows: 95109
מספר תאגידs with matching מספר מנפיק: 95083 out of total relevant rows: 87288
מספר ני"עs with matching מספר מנפיק: 95092 out of total relevant rows: 95109
ISINs with matching מספר מנפיק: 95106 out of total relevant rows: 117263
ISINs with matching LEI: 25944 out of total relevant rows: 117263

5. Matching holdings with previously classified

1. matching to previously classified by Israeli security number

previous is_fossil coverage
Israeli security numbers previously classified: 19218 out of total holdings: 27992

2. matching to previously classified by ISIN

previous is_fossil coverage
ISINs previously classified: 22951 out of total holdings: 27992

3. matching to previously clas

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lambda x: x.fillna(x.mean()) if x.mean() in [0, 1] else x)



is_fossil coverage before propagation by מספר ני"ע:
0.00    24431
1.00     3019
nan       542
Name: is_fossil, dtype: int64

is_fossil coverage after propagation by מספר ני"ע:
0.00    24437
1.00     3019
nan       536
Name: is_fossil, dtype: int64

Propagating by ISIN

is_fossil coverage before propagation by ISIN:
0.00    24437
1.00     3019
nan       536
Name: is_fossil, dtype: int64

is_fossil coverage after propagation by ISIN:
0.00    24441
1.00     3019
nan       532
Name: is_fossil, dtype: int64

Propagating by LEI

is_fossil coverage before propagation by LEI:
0.00    24441
1.00     3019
nan       532
Name: is_fossil, dtype: int64

is_fossil coverage after propagation by LEI:
0.00    24441
1.00     3019
nan       532
Name: is_fossil, dtype: int64

Writing results to data/downloaded reports/company reports 2022Q1/holdings_for_cls with fossil classification.csv


# Manual Review
In a google spreadsheet or excel.
Download the fully classifed file into a csv, then use it in holding_cls_path to update prev_class (see below).

## Tips
1. Look at the output of the script, review conflicting classification (by ISIN, LEI, Israeli security number)
2. Look at holdings that get is_fossil_conflict=True
3. Sort by security name, Israeli security number or ISIN for faster manual classification
4. Carefully review holdings that have only is_fossil by FFF name match, as there are false matches.
5. Review holdings from suspicious industries: energy, oil and gas, utilities, materials.

# Add classification results to prev_class

In [4]:
holdings_cls_path = "data/downloaded reports/company reports 2021Q4/2021q4 - holdings_for_cls with fossil classification - reviewed.csv"
prev_class_path = "data_sources/prev_class.csv"
update_prev_class(holdings_cls_path, prev_class_path)
# prev_class_fixed = add_all_id_types_to_holdings(prev_class, tlv_s2i, isin2lei)

Adding classifications to prev_class, saving the previous version as data_sources/prev_class backup/prev_class 2022-04-13 03-20-20.csv


# Classify fund holdings
Data is scraped from https://mayaapi.tase.co.il/api/fund/details?fundId=
<br>Page address: https://maya.tase.co.il/fund/5132287?view=assets

In [126]:
import json

In [127]:
response_directory = "data/holdings_for_classification/5132287/"
response_path = response_directory + "response.json"
fund = pd.read_json(response_path, orient="index")
assets = pd.DataFrame(fund.loc["AssetCompostion"][0]['Assets'])
# holdings["AssetCompostion"].head()
cols_rename = {
    'AssetName': 'שם המנפיק/שם נייר ערך',
    'IdentityCd': 'מספר ני"ע',
    'Id': 'fund_id'
}
assets = assets.rename(cols_rename, axis=1)
assets["מספר מנפיק"] = '00'
assets["מספר תאגיד"] = '00'
assets.to_csv(response_directory+"assets.csv", index=False)

In [128]:
classify_holdings(response_directory+"assets.csv")


1.Preparing holding file

** Holdings file for classification **
data/holdings_for_classification/5132287/assets.csv
columns: Index(['מספר ני"ע', 'שם המנפיק/שם נייר ערך', 'AssetTypeName', 'FundPercentage',
       'NisValue', 'Price', 'Quantity', 'BondRank', 'Graph', 'fund_id',
       'ManagerId', 'מספר מנפיק', 'מספר תאגיד'],
      dtype='object')

Holding file ISIN col is: מספר ני"ע
number of ISINs: 35 out of 38 rows

ERROR: no Israel Corp Numbers in holdings file, reverting to default: מספר מנפיק

2.Preparing mapping files


  return func(self, *args, **kwargs)



Holding file ISIN col is: מספר ני"ע
number of ISINs: 35 out of 38 rows

no מספר תאגידs in holdings file

no LEIs in holdings file
מספר ני"עs with matching ISIN: 35 out of total relevant rows: 3
מספר תאגידs with matching מספר מנפיק: 3 out of total relevant rows: 38
מספר ני"עs with matching מספר מנפיק: 3 out of total relevant rows: 3
ISINs with matching מספר מנפיק: 3 out of total relevant rows: 35
ISINs with matching LEI: 30 out of total relevant rows: 35


  return func(self, *args, **kwargs)



Holding file ISIN col is: ISIN
number of ISINs: 26862 out of 31214 rows

Holding file מספר תאגיד col is: מספר תאגיד
number of מספר תאגידs: 15550 out of 31214 rows

Holding file LEI col is: LEI
number of LEIs: 8250 out of 31214 rows
מספר ני"עs with matching ISIN: 26868 out of total relevant rows: 18368
מספר תאגידs with matching מספר מנפיק: 18126 out of total relevant rows: 15550
מספר ני"עs with matching מספר מנפיק: 18126 out of total relevant rows: 18368
ISINs with matching מספר מנפיק: 18127 out of total relevant rows: 26868
ISINs with matching LEI: 8250 out of total relevant rows: 26868

1. matching to previously classified by Israeli security number

previous is_fossil coverage
Israeli security numbers previously classified: 0 out of total holdings: 38

2. matching to previously classified by ISIN

previous is_fossil coverage
ISINs previously classified: 33 out of total holdings: 38

3. matching to previously classified by issuer number
issuers previously classified: 0 out of total h

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prop_col_not_null['is_fossil'] = grouped_by_prop_col['is_fossil'].transform(lambda x: x.fillna(x.mean()))
