### 2. Transaccion típicas y atípicas

>   Using your visualizations, find and display examples of typical and atypical business transactions (e.g., mergers, acquisitions, etc.). Can you infer the motivations behind changes in their activity?

## Carga de datos

In [1]:
import json
import pandas as pd

In [2]:
json_path = './MC3/mc3.json'

In [3]:
# open json file
with open(json_path) as json_file:
    data = json.load(json_file)

In [4]:
# Convert list of dictionaries to DataFrame
df_nodes = pd.DataFrame(data['nodes'])
df_links = pd.DataFrame(data['links'])

### Nodes: Entitites -> Persons, Companies

In [5]:
# Entidades -> partir en personas y otra en organizaciones
df_nodes['type'].value_counts()

type
Entity.Person                           50356
Entity.Organization.Company              7927
Entity.Person.CEO                        1293
Entity.Organization.FishingCompany        600
Entity.Organization.LogisticsCompany      311
Entity.Organization.FinancialCompany       23
Entity.Organization.NewsCompany             5
Entity.Organization.NGO                     5
Name: count, dtype: int64

De los nodos se tienen entidades:
- Person
  - CEO
- Organization
  - FishingCompany
  - LogisticsCompany
  - FinancialCompany
  - NewsCompany
  - NGO

In [12]:
# DataFrames for each type of node
Organization_Company = df_nodes[df_nodes['type'] == 'Entity.Organization.Company']
Organization_FishingCompany = df_nodes[df_nodes['type'] == 'Entity.Organization.FishingCompany']
Organization_LogisticsCompany = df_nodes[df_nodes['type'] == 'Entity.Organization.LogisticsCompany']
Organization_FinancialCompany = df_nodes[df_nodes['type'] == 'Entity.Organization.FinancialCompany']
Organization_NewsCompany = df_nodes[df_nodes['type'] == 'Entity.Organization.NewsCompany']
Organization_NGO = df_nodes[df_nodes['type'] == 'Entity.Organization.NGO']
Person = df_nodes[df_nodes['type'] == 'Entity.Person']
Person_CEO = df_nodes[df_nodes['type'] == 'Entity.Person.CEO']


### Links -> Events, Relationships

In [14]:
# Entidades -> partir en personas y otra en organizaciones
df_links['type'].value_counts()

type
Event.Owns.Shareholdership         39378
Event.Owns.BeneficialOwnership     21531
Event.WorksFor                     14817
Relationship.FamilyRelationship       91
Name: count, dtype: int64

In [15]:
Event_Owns_Shareholdership = df_links[df_links['type'] == 'Event.Owns.Shareholdership']
Event_Owns_BeneficialOwnership = df_links[df_links['type'] == 'Event.Owns.BeneficialOwnership']
Event_WorksFor = df_links[df_links['type'] == 'Event.WorksFor']
Relationship_FamilyRelationship = df_links[df_links['type'] == 'Relationship.FamilyRelationship']

## Analysis

I guess that mergers and acquisitions should be related to dates. Are dates in the data?

In [18]:
Event_Owns_BeneficialOwnership.head(5)

Unnamed: 0,start_date,type,_last_edited_by,_last_edited_date,_date_added,_raw_source,_algorithm,source,target,key,end_date
335,2018-05-10T00:00:00,Event.Owns.BeneficialOwnership,Pelagia Alethea Mordoch,2035-01-01T00:00:00,2035-01-01T00:00:00,Existing Corporate Structure Data,Automatic Import,Laura Newman,Briggs-Wilson,0,
338,2013-11-30T00:00:00,Event.Owns.BeneficialOwnership,Pelagia Alethea Mordoch,2035-01-01T00:00:00,2035-01-01T00:00:00,Existing Corporate Structure Data,Automatic Import,Jillian Morales,Briggs-Wilson,0,
339,2012-05-04T00:00:00,Event.Owns.BeneficialOwnership,Pelagia Alethea Mordoch,2035-01-01T00:00:00,2035-01-01T00:00:00,Existing Corporate Structure Data,Automatic Import,Anna Bailey,Briggs-Wilson,0,
340,2007-03-16T00:00:00,Event.Owns.BeneficialOwnership,Pelagia Alethea Mordoch,2035-01-01T00:00:00,2035-01-01T00:00:00,Existing Corporate Structure Data,Automatic Import,Dawn King,Briggs-Wilson,0,
341,2016-09-28T00:00:00,Event.Owns.BeneficialOwnership,Pelagia Alethea Mordoch,2035-01-01T00:00:00,2035-01-01T00:00:00,Existing Corporate Structure Data,Automatic Import,Dawn King,Fleming-Diaz,0,


What is "beneficial ownership"?
> In domestic and international commercial law, a beneficial owner is a natural person or persons who ultimately owns or controls an interest in a legal entity or arrangement, such as a company, a trust, or a foundation.[1] Legal owners (i.e. the owners on the record), commonly described as the "registered owners", may hold those interests as beneficial owners or for the benefit of someone else, in which case they may be described as a "nominee".  
>  
> Wikipedia contributors, "Beneficial ownership," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Beneficial_ownership&oldid=1225725496 (accessed June 5, 2024).

In simple terms, the **real** owner.

What does source and target means in this dataset?
> Source: The "source" entity is the one that holds ownership, control, or influence over another entity. This could be an individual, a company, or another type of legal entity that has a beneficial ownership interest in the "target" entity.
> 
> Target: The "target" entity is the one that is owned, controlled, or influenced by the "source" entity. This is the entity in which the beneficial ownership interest is held.
>
> OpenAI | ChatGPT



Is there a concentration of ownership in the dataset?  
Let's find out by counting the number of entities that are sources.

In [19]:
# count unique values in the column source
Event_Owns_BeneficialOwnership['source'].value_counts()

source
Sandra Young        92
Anna Davis          92
Cynthia Anderson    91
Kelsey Ortega       91
Breanna Price       91
                    ..
Marc Haney           1
Veronica Proctor     1
Rachel Garcia        1
Eric Klein           1
Carl Martinez        1
Name: count, Length: 16231, dtype: int64

There are a lot of owners!  Let's see how many use more than 10 targets.

In [21]:
## same as above but only those with more than entries
moreThan10 = Event_Owns_BeneficialOwnership['source'].value_counts()[Event_Owns_BeneficialOwnership['source'].value_counts() > 10]
len(moreThan10)

34

Not too much.

I can track the dates for a single one of them. Let's see for those at the top of the list, the ones with more than 10 targets.  
I'd like to plot a timeline for each of them to see if the changes of targets are related to the dates of the transactions.

In [22]:
moreThan10.head(5)

source
Sandra Young        92
Anna Davis          92
Cynthia Anderson    91
Kelsey Ortega       91
Breanna Price       91
Name: count, dtype: int64

In [38]:
# Let's take the first one, source = 'Sandra Young'
# I want to use holoviews to plot a timeline with horizontal bars for each target start_date and if there is one end_date
# I will use the target as the y of each horizontal bar in the plot

# Filter the data
source = 'Sandra Young'
df_source = Event_Owns_BeneficialOwnership[Event_Owns_BeneficialOwnership['source'] == source]

In [45]:
# show only columns for targets, start_date and end_date
singleSource = df_source[['target', 'start_date', 'end_date']]

In [47]:
singleSource.sort_values(by='target', inplace=False)

Unnamed: 0,target,start_date,end_date
1966,Adams-Byrd,2027-03-30T00:00:00,
1999,"Alexander, Harris and Rhodes",2033-03-25T00:00:00,
1949,"Anderson, Smith and Weber",2020-09-26T00:00:00,
2007,Anderson-Vazquez,2011-11-27T00:00:00,
1977,Andrade and Sons,2027-08-05T00:00:00,
...,...,...,...
2037,Walker LLC,2032-09-03T00:00:00,
1982,Walker-Thompson,2017-07-09T00:00:00,
1957,Walton-Blair,2029-10-19T00:00:00,
2026,"Wells, Morales and Gallagher",2008-09-27T00:00:00,


In [43]:
min(singleSource['start_date'])

Timestamp('2005-09-08 00:00:00')

In [41]:
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')

In [42]:
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.io import output_notebook

# Activate HoloViews with Bokeh backend
hv.extension('bokeh')
output_notebook()

singleSource['start_date'] = pd.to_datetime(singleSource['start_date'])
singleSource['end_date'] = pd.to_datetime(singleSource['end_date'])

# Create Rectangles for the Gantt chart
rectangles = []
for _, row in singleSource.iterrows():
    rectangles.append(hv.Rectangles([(row['start_date'], row['target'], row['end_date'], row['target'])]))

gantt_chart = hv.Overlay(rectangles)

# Customize plot options
gantt_chart.opts(
    opts.Rectangles(height=400, width=800, show_grid=True, xrotation=45, ylabel='Target', xlabel='Date',
                    tools=['hover'], show_legend=False, title="Gantt Chart", line_width=2, fill_alpha=0.5)
)

# Display Gantt chart
gantt_chart


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  singleSource['start_date'] = pd.to_datetime(singleSource['start_date'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  singleSource['end_date'] = pd.to_datetime(singleSource['end_date'])


In [33]:
rectangles

[:Rectangles   [x0,y0,x1,y1],
 :Rectangles   [x0,y0,x1,y1],
 :Rectangles   [x0,y0,x1,y1]]