In [1]:
!pip install dataset

Collecting dataset
  Downloading dataset-1.5.2-py2.py3-none-any.whl (18 kB)
Collecting banal>=1.0.1
  Downloading banal-1.0.6-py2.py3-none-any.whl (6.1 kB)
Installing collected packages: banal, dataset
Successfully installed banal-1.0.6 dataset-1.5.2


## What to Expect in this Notebook ?

Have you wondered how Reliance Industries is [performing](#Q7) based on its market value and earning per share? Or, have thought about the [powerful people](#Q2) behind multi-million INR companies in India who decide where share holder's profits are invested? Where are the companies in the universe of Finance?

How about the link between [Earnings and Market value](#Q6) or [Earnings to Book Value](#Q8). Not just these questions get answered by the data. The note book [shows how the data can be retrived](#start) from the websites that we visit regularly. 

All this is made possible with BeautifulSoup and the html links.

(As an aside, none of the code is shown in this notebook, since scraping can be very intrusive and potentially dangerous, so it has been removed. Initial versions of the notebooks had the complete codes. Doing some research on the organisations that have ended up with litigations, I felt it is better to avoid such challenges. The notebook is there to show the possibility and inspire. Hope it meets 10% what it intends to do.)

# [And much more.. hop on](#questions)

The visual information is directly presented which answers the question posed. The entire notebook can be copied, and the visualisation code can be understood and replicated very easily. That is how, I have learnt to code or just to cut, copy and Paaaasssttteee...

## <a id="content"> Contents </a>

### Purpose :

Datascience begins with Data. Data in this world starts with websites, and APIs. This notebook shows the complete cycle of data gathering to analysis conclusion. Emphasis is on Data gathering using the BeautifulSoup and Requests libraries. As I went through the extraction process, an idea struck. What will happen once the web 3.0 kicks in??? 

Explore the connection of the directors, board members in the Indian companies. This will give the information on the power centers and decision makers in the Indian organisations. To begin with, we will concentrate on the index stocks and then the move to more companies and their directors.

### Major Libraries Involved:

BeautifulSoup, Requests, Plotly Express and Dataset libraries

<a id="questions"></a>

With the data of the directors, board members in hand here are some questions that can be answered.

[What is the maximum number of Board members a company can have?](#Q1)

[Which board members are there in board of multiple companies?](#Q2)

[Which companies these board members are part of?](#Q3)

[How many type of Board members are there?](#Q4) 

[Are the number of board members and the market capitalisation linked?](#Q5)

[Is the Earnings Per Share linked to market capitalisation linked?](#Q6)

[Is the Book Value linked to market capitalisation linked?](#Q7)

[Is the Book Value linked to Earnings per Share? ](#Q8)


In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import requests
import dataset
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse, urldefrag
from joblib import Parallel, delayed
import os
import plotly.express as px
import plotly.graph_objects as go

In [3]:
os.chdir('../input/indian-companies-fundamentaldata')
print('reading companies table to company_link dataset')
company_link = pd.read_sql_table('companies','sqlite:///owners.db')
print('reading mgmt table to management dataset')
management = pd.read_sql_table('mgmt','sqlite:///owners.db')

reading companies table to company_link dataset
reading mgmt table to management dataset


In [27]:
management.to_csv('/kaggle/working/management.csv',index=False)

In [11]:
import json

In [18]:
companyJson = company_link.to_json(orient='split',index=False)

In [22]:
parsed = json.loads(companyJson)

In [24]:
json.dumps(parsed,'companyData.json')

TypeError: dumps() takes 1 positional argument but 2 were given

[Back to contents](#content)

### <a id="Q1"> 1) What is the maximum number of Board members a company can have?</a>

In [None]:
#Loading the management table as dataset
q1 = management.groupby('company')['name'].count().reset_index()
q1.sort_values(by='name',ascending=False,inplace=True)
print('{} has maximum number of Board member and it is {}'.format(q1.company.values[0],q1.name.values[0]))

vis1 = px.bar(data_frame=q1[:10],y='company',x='name',color='company')
vis1.update_layout(title='Company with the highest number of board members')
vis1.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q2"> Which Board members are there in more than one company can have?</a> 

In [None]:
q2 = management.groupby('name')['id'].count().reset_index()
q2.sort_values(by='id',ascending=False,inplace=True)
q2 = q2[q2.name != '']
print('There are {} members who are part of multiple companies'.format(q2[q2.id > 1].shape[0]))
vis2 = px.bar(data_frame=q2[:10],y='name',x='id',color='name')
vis2.update_layout(title='Directors who are part of more than one company')
vis2.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q3"> Which companies these board members are part of?</a> 

In [None]:
dir_list = q2[q2.id > 1].name.values
#create dictionary to hold the director names, and their companies
multiple_comp = {}
for director in dir_list:
    #create the list of companies and attach the list to the director name  
    multiple_comp[director] = management.loc[management.name == director,'company'].values

#Write the dictionary to dataframe and display. Since the dictionary is having different length values, 
director_comp = pd.DataFrame(list(multiple_comp.values()),index=multiple_comp.keys())
director_comp.head()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q4"> How many types of board members are there, what is their distribution?</a> 

In [None]:
q4 = management.groupby('designation')['company'].count().reset_index()
q4.sort_values(by='company',inplace=True,ascending=False)

vis4 = px.bar(data_frame=q4[:10],y='designation',x='company',color='designation')
vis4.update_layout(title='Types of Board members and their distribution')
vis4.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q5"> Are the number of board members and the market capitalisation linked? </a> 

In [None]:
def rem_comma(x):
    if x != 0.0:
        return x.replace(',','')
    else:
        pass

In [None]:
#Need to change the datatypes of company_link dataframe, and the data needs to be cleaned
company_link.loc[company_link.PtoE == '--','PtoE'] = 0.0
company_link.loc[company_link.PtoB == '--','PtoB'] = 0.0
company_link.loc[company_link.EPS == '--','EPS'] = 0.0
company_link.loc[company_link.Market_Cap == '--','Market_Cap'] = 0.0

company_link.loc[:,'PtoE'] = company_link.PtoE.apply(lambda x : rem_comma(x))
company_link.loc[:,'PtoB'] = company_link.PtoB.apply(lambda x : rem_comma(x))
company_link.loc[:,'EPS'] = company_link.EPS.apply(lambda x : rem_comma(x))
company_link.loc[:,'Market_Cap'] = company_link.Market_Cap.apply(lambda x : rem_comma(x))

company_link['PtoB'] = company_link['PtoB'].astype('float')
company_link['EPS'] = company_link['EPS'].astype('float')
company_link['PtoE'] = company_link['PtoE'].astype('float')
company_link['Market_Cap'] = company_link['Market_Cap'].astype('float')

In [None]:
company_link = pd.merge(left=company_link,right=q1,left_on='company',right_on='company',how='left')
company_link.columns = ['id', 'url', 'entry_url', 'company', 'Market_Cap', 'EPS', 'PtoE',
                        'Book_value', 'PtoB','Board_count']


In [None]:
vis5 = px.scatter(data_frame=company_link,x='Market_Cap',y='Board_count',
                  color='Market_Cap',color_continuous_scale=px.colors.cyclical.Phase_r,
                 size_max=25,size = 'Board_count',text='company',title="layout.hovermode='x'")
vis5.update_layout(title='Board Count and Market Capitalisation',height=1000)
vis5.update_traces(mode="markers", hovertemplate=None)
vis5.update_xaxes(title="Market Capitalisation Cr INR",type='log')
vis5.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q6"> Is the Earnings Per Share linked to market capitalisation linked? </a> 

In [None]:
vis6 = px.scatter(data_frame=company_link,x='Market_Cap',y='EPS',
                  color='Market_Cap',color_continuous_scale=px.colors.cyclical.Phase_r,
                 size_max=25,size = 'Board_count',text='company',title="layout.hovermode='x'")
vis6.update_layout(title='EPS vs Market Capitalisation',height=1000)
vis6.update_traces(mode="markers", hovertemplate=None)
vis6.update_xaxes(title="Market Capitalisation Cr INR",type='log')
vis6.update_yaxes(title="EPS ratio",type='log')
vis6.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q7"> Is the Book Value linked to market capitalisation linked? </a> 

In [None]:
vis7 = px.scatter(data_frame=company_link,x='Market_Cap',y='Book_value',
                  color='Market_Cap',color_continuous_scale=px.colors.cyclical.Phase_r,
                 size_max=25,size = 'Board_count',text='company',title="layout.hovermode='x'")
vis7.update_layout(title='Book Value per share vs Market Capitalisation',height=1000)
vis7.update_traces(mode="markers", hovertemplate=None)
vis7.update_xaxes(title="Market Capitalisation Cr INR",type='log')
vis7.update_yaxes(title="Book value per Share")
vis7.show()

[Back to contents](#content)

[Back to questions](#questions)

### <a id="Q8"> Is the Book Value linked to Earnings per Share? </a> 

In [None]:
vis8 = px.scatter(data_frame=company_link,x='EPS',y='Book_value',
                  color='EPS',color_continuous_scale=px.colors.cyclical.Phase_r,
                 size_max=25,size = 'Board_count',text='company',title="layout.hovermode='x'")
vis8.update_layout(title='Book Value per share vs Earning Per Share',height=1000)
vis8.update_traces(mode="markers", hovertemplate=None)
vis8.update_xaxes(title="Earning Per Share",type='log')
vis8.update_yaxes(title="Book value per Share")
vis8.show()

[Back to contents](#content)

[Back to questions](#questions)

Data and the visuals raises more question than they answer. The data has much more hidden inside. Appropriate feature engineering might bring more information. The questions which are creative and provocative make it more interesting. Thanks for joining this journey.