<a id=top></a>

# **目次**

<b>
    <details>
        <summary>
            <a href="#modules", style="font-size: xx-large">1. モジュールインポート</a>
            <ul>※サードパーティライブラリ>>>自作モジュール>>>（ここまで本ipynb外）>>>自作関数（本ipynb内）</ul>
        </summary>
    </details>
    <details>
        <summary>
            <a href="#data", style="font-size: xx-large">2. オリジナルデータインポート</a>
        </summary>
    </details>
    <details>
        <summary>
            <a href="#patentcount", style="font-size: xx-large">3. 特許数</a>
        </summary>
        <table></table>
    </details>
    <details>
        <summary>
            <a href="#calculateindicator", style="font-size: xx-large">4. 各指標</a>
        </summary>
    </details>
    <details>
        <summary>
            <a href="#output", style="font-size: xx-large">5. ファイルに出力</a>
        </summary>
    </details>
</b>


---


<a id=modules></a>

## **1. モジュールインポート**


In [1]:
import pandas as pd
import numpy as np
from ecomplexity import ecomplexity

# 小数点以下 桁数 6
pd.options.display.float_format = '{:.6f}'.format


In [2]:
# 自作モジュールインポート
from Scripts.kci import df_to_kci as dtk
from Scripts.vizualize import rank as vr
from Scripts.vizualize import distribution as vd


In [3]:
def weight_by_ipc(reg_num_df: pd.DataFrame):
    ipc_weight_df = reg_num_df.copy()\
                               .groupby(['right_person_name', 'reg_num'])\
                               [['ipc_class']].nunique().reset_index(drop=False)\
                               .rename(columns={'ipc_class':'weight'})
    ipc_weight_df['weight'] = round(1 / ipc_weight_df['weight'], 2)

    weighted_reg_num_df = pd.merge(reg_num_df, ipc_weight_df, 
                                   on=['reg_num', 'right_person_name'], 
                                   how='left')
    weighted_reg_num_df = weighted_reg_num_df.drop_duplicates()\
                                .groupby(['right_person_name', 'ipc_class'])[['weight']].sum()\
                                .sort_values('weight', ascending=False)\
                                .reset_index(drop=False)\
                                .rename(columns={'weight':'reg_num'})
    return weighted_reg_num_df

In [4]:
# 上位何%までを抽出するか
def extract_top_p(reg_num_df: pd.DataFrame, 
                  top_p: int):
    top_p_right_person_list = reg_num_df.copy().groupby(['right_person_name'])[['reg_num']].sum()\
                                               .sort_values(['reg_num'], ascending=False)\
                                               .reset_index(drop=False)\
                                               .head((reg_num_df['right_person_name'].nunique()*top_p//100)+1)\
                                               ['right_person_name'].to_list()
    
    reg_num_top_p_df = reg_num_df[reg_num_df['right_person_name'].isin(top_p_right_person_list)].copy()
    # reg_num_top_p_df['segment'] = reg_num_top_p_df['period'].str[:4].astype(np.int64)
    return reg_num_top_p_df
    # sep_year_top_p_df_dict[p][period] = sep_year_top_p_df.copy()

In [5]:
def kh1_ki1(c_df):
    kh1_ki1_df = pd.merge(c_df.copy(), 
                        c_df[c_df['mcp']==1].groupby(['right_person_name'])[['ubiquity']].sum().reset_index(drop=False).copy().rename(columns={'ubiquity':'kh_1'}), 
                        on=['right_person_name'], how='left')
    kh1_ki1_df = pd.merge(kh1_ki1_df.copy(), 
                        c_df[c_df['mcp']==1].groupby(['ipc_class'])[['diversity']].sum().reset_index(drop=False).copy().rename(columns={'diversity':'ki_1'}), 
                        on=['ipc_class'], how='left')
    kh1_ki1_df['kh_1'] = kh1_ki1_df['kh_1'] / kh1_ki1_df['diversity']
    kh1_ki1_df['ki_1'] = kh1_ki1_df['ki_1'] / kh1_ki1_df['ubiquity']
    return kh1_ki1_df


---


<a id=data></a>

## **2. オリジナルデータインポート**


In [6]:
year_start = 1981
year_end = 2010
year_range = 10

ar = 'app'
ipc_digit = 4
weight = 'fraction'


In [7]:
# 全体
all_df = pd.read_csv(f'../../1_DataFiltering/Data/Dealed/{ar}.csv', 
                     encoding='utf-8', 
                     sep=',', 
                     usecols=['reg_num', 
                              'right_person_name', 
                              f'{ar}_year', 
                              'ipc_class'], 
                     dtype={'reg_num':str, 
                            'right_person_name':str, 
                            f'{ar}_year':np.int64, 
                            'ipc_class':str})

all_df['ipc_class'] = all_df['ipc_class'].str[:ipc_digit]
all_df = all_df[all_df[f'{ar}_year'].isin(range(year_start, year_end+1))]\
               .drop_duplicates()\

print('特許数（次数削減前）:', all_df['reg_num'].nunique())
print('特許権者（次数削減前）:', all_df['right_person_name'].nunique())
print('IPCクラス（次数削減前）:', all_df['ipc_class'].nunique())
display(all_df.head())


# 各期間
sep_year_df_dict = {}

for year in range(year_start, year_end+1, year_range):
    sep_year_df_dict[f'{year}-{year+year_range-1}'] = all_df[all_df[f'{ar}_year'].isin(range(year, year+year_range))]
    print(f'=============={year}-{year+year_range-1}==============')
    print('特許数（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['reg_num'].nunique())
    print('特許権者（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['right_person_name'].nunique())
    print('IPCクラス（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['ipc_class'].nunique())
    print('=====================================\n')


特許数（次数削減前）: 3189444
特許権者（次数削減前）: 64591
IPCクラス（次数削減前）: 630


Unnamed: 0,reg_num,right_person_name,app_year,ipc_class
0,5684492,ＤＲＣ合同会社,2010,G10H
1,5684512,株式会社ＩＨＩエアロスペース,2010,B62D
2,5684598,株式会社オカムラ,2010,A47C
3,5684620,三井化学株式会社,2010,H01M
12,5684736,シロキ工業株式会社,2010,B60N


特許数（次数削減前）: 832302
特許権者（次数削減前）: 22671
IPCクラス（次数削減前）: 613

特許数（次数削減前）: 1058476
特許権者（次数削減前）: 34375
IPCクラス（次数削減前）: 616

特許数（次数削減前）: 1298666
特許権者（次数削減前）: 37076
IPCクラス（次数削減前）: 617



<a href=#top>先頭に戻る</a>

---


<a id=patentcount></a>

## **3. 特許数**


In [8]:
# 上位何%
top_p = 3

In [9]:
# 全体
if weight == 'fraction':
    all_reg_num_df = weight_by_ipc(all_df)
else:
    all_reg_num_df = all_df.groupby(['right_person_name', 'ipc_class'])[['reg_num']].nunique().reset_index(drop=False)
all_reg_num_df['segment'] = f'{year_start}-{year_end}'
all_reg_num_top_p_df = extract_top_p(all_reg_num_df, top_p)

display(all_reg_num_top_p_df.head())


# 期間ごと
sep_year_reg_num_df_dict = {}
sep_year_reg_num_top_p_df_dict = {}
for period, sep_year_df in sep_year_df_dict.items():
    if weight == 'fraction':
        sep_year_reg_num_df_dict[period] = weight_by_ipc(sep_year_df)
    else:
        all_reg_num_df = sep_year_df.groupby(['right_person_name', 'ipc_class'])[['reg_num']].nunique().reset_index(drop=False)
    sep_year_reg_num_df_dict[period]['segment'] = period
    sep_year_reg_num_top_p_df_dict[period] = extract_top_p(sep_year_reg_num_df_dict[period], top_p)

sep_year_reg_num_top_p_df = pd.concat(sep_year_reg_num_top_p_df_dict.values(), 
                                      axis='index')

display(sep_year_reg_num_top_p_df)


Unnamed: 0,right_person_name,ipc_class,reg_num,segment
0,株式会社リコー,G03G,14615.0,1981-2010
1,キヤノン株式会社,G03G,13958.5,1981-2010
2,富士通株式会社,G06F,13040.0,1981-2010
3,キヤノン株式会社,H04N,12151.5,1981-2010
4,日本電気株式会社,G06F,11593.5,1981-2010


Unnamed: 0,right_person_name,ipc_class,reg_num,segment
0,富士通株式会社,G06F,4969.000000,1981-1990
1,パナソニツクホールデイングス株式会社,G11B,4274.000000,1981-1990
2,キヤノン株式会社,G03G,4071.000000,1981-1990
3,日本電気株式会社,H01L,3517.500000,1981-1990
4,ソニーグループ株式会社,G11B,3467.000000,1981-1990
...,...,...,...,...
149504,太平洋セメント株式会社,E05B,0.330000,2001-2010
149505,株式会社ＧＳユアサ,G02B,0.330000,2001-2010
149506,株式会社ＧＳユアサ,F21Y,0.330000,2001-2010
149508,アルパイン株式会社,G03B,0.330000,2001-2010


In [13]:
print('特許数（次数削減前）:', all_reg_num_top_p_df.groupby('right_person_name')['reg_num'].sum().reset_index(drop=False)['reg_num'].sum())
print('特許権者（次数削減前）:', all_reg_num_top_p_df['right_person_name'].nunique())
print('IPCクラス（次数削減前）:', all_reg_num_top_p_df['ipc_class'].nunique())
display(all_reg_num_top_p_df.head())


# 各期間
# sep_year_df_dict = {}

# for year in range(year_start, year_end+1, year_range):
#     sep_year_df_dict[f'{year}-{year+year_range-1}'] = all_df[all_df[f'{ar}_year'].isin(range(year, year+year_range))]
#     print(f'=============={year}-{year+year_range-1}==============')
#     print('特許数（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['reg_num'].nunique())
#     print('特許権者（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['right_person_name'].nunique())
#     print('IPCクラス（次数削減前）:', sep_year_df_dict[f'{year}-{year+year_range-1}']['ipc_class'].nunique())
#     print('=====================================\n')


特許数（次数削減前）: 3177824.82
特許権者（次数削減前）: 1938
IPCクラス（次数削減前）: 627


Unnamed: 0,right_person_name,ipc_class,reg_num,segment
0,株式会社リコー,G03G,14615.0,1981-2010
1,キヤノン株式会社,G03G,13958.5,1981-2010
2,富士通株式会社,G06F,13040.0,1981-2010
3,キヤノン株式会社,H04N,12151.5,1981-2010
4,日本電気株式会社,G06F,11593.5,1981-2010


In [19]:
print(all_df[all_df['right_person_name'].isin(all_reg_num_top_p_df['right_person_name'])]['reg_num'].nunique())
for p, d in sep_year_df_dict.items():
    print(d[d['right_person_name'].isin(sep_year_reg_num_top_p_df_dict[p]['right_person_name'])]['reg_num'].nunique())

2894529
718610
908704
1129718


<a href=#top>先頭に戻る</a>

---


<a id=calculateindicator></a>

## **4. 各指標**


In [10]:
trade_cols = {'time':'segment', 'loc':'right_person_name', 'prod':'ipc_class', 'val':'reg_num'}
rename_col_dict = {'eci':'kci', 'pci':'tci'}
col_order_list = ['segment', 'right_person_name', 'ipc_class', 'reg_num', 'rca', 'mcp', 'diversity', 'ubiquity', 'kci', 'tci']


In [11]:
# 全体
all_c_df = ecomplexity(all_reg_num_top_p_df, 
                       cols_input = trade_cols, 
                       rca_mcp_threshold = 1)
all_c_df = all_c_df[all_c_df['reg_num'] > 0]\
                    .rename(columns=rename_col_dict)\
                    [col_order_list]
all_c_df = kh1_ki1(all_c_df)

display(all_c_df.head())
display(all_c_df.describe())
print(all_c_df.info())


# 各期間
sep_year_c_df = ecomplexity(sep_year_reg_num_top_p_df, 
                            cols_input = trade_cols, 
                            rca_mcp_threshold = 1)
sep_year_c_df = sep_year_c_df[sep_year_c_df['reg_num'] > 0]\
                            .rename(columns=rename_col_dict)\
                            [col_order_list]
sep_year_c_df = pd.concat([kh1_ki1(sep_year_c_df[sep_year_c_df['segment'] == segment]) for segment in sep_year_c_df['segment'].unique()], 
                          axis='index')

for segment in sep_year_c_df['segment'].unique():
    display(sep_year_c_df[sep_year_c_df['segment'] == segment].head())
    display(sep_year_c_df[sep_year_c_df['segment'] == segment].describe())
    print(sep_year_c_df[sep_year_c_df['segment'] == segment].info())
    print('\n')


1981-2010


Unnamed: 0,segment,right_person_name,ipc_class,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
0,1981-2010,あすか製薬株式会社,A23D,1.0,18.677706,1,17,61,1.667732,3.558648,254.0,47.327869
1,1981-2010,あすか製薬株式会社,A23K,5.0,112.979686,1,17,165,1.667732,2.657556,254.0,52.606061
2,1981-2010,あすか製薬株式会社,A23L,2.0,2.911133,1,17,286,1.667732,2.265449,254.0,43.055944
3,1981-2010,あすか製薬株式会社,A47C,1.0,2.873728,1,17,153,1.667732,-0.081085,254.0,45.27451
4,1981-2010,あすか製薬株式会社,A61B,1.0,0.503522,0,17,160,1.667732,0.279108,254.0,48.63125


Unnamed: 0,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
count,122410.0,122410.0,122410.0,122410.0,122410.0,122410.0,122410.0,122410.0,122410.0
mean,25.9605,15.451603,0.599167,57.339441,210.744041,-0.034436,-0.098033,193.844806,51.900273
std,189.552438,116.79417,0.490069,32.834788,110.138145,0.907452,1.352366,35.882888,9.228387
min,0.33,0.000792,0.0,1.0,1.0,-3.25886,-3.753788,47.0,15.265625
25%,1.0,0.464922,0.0,33.0,129.0,-0.556538,-0.804485,168.576471,46.069307
50%,2.0,1.601253,1.0,51.0,191.0,0.006653,-0.023819,192.230769,50.48954
75%,8.5,5.928452,1.0,74.0,284.0,0.511982,0.740236,218.4,55.419795
max,14615.0,10772.6527,1.0,220.0,478.0,2.427175,3.558648,327.785714,136.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 122410 entries, 0 to 122409
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   segment            122410 non-null  object 
 1   right_person_name  122410 non-null  object 
 2   ipc_class          122410 non-null  object 
 3   reg_num            122410 non-null  float64
 4   rca                122410 non-null  float64
 5   mcp                122410 non-null  int32  
 6   diversity          122410 non-null  int32  
 7   ubiquity           122410 non-null  int32  
 8   kci                122410 non-null  float64
 9   tci                122410 non-null  float64
 10  kh_1               122410 non-null  float64
 11  ki_1               122410 non-null  float64
dtypes: float64(6), int32(3), object(3)
memory usage: 9.8+ MB
None
1981-1990
1991-2000
2001-2010


Unnamed: 0,segment,right_person_name,ipc_class,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
0,1981-1990,いすゞ自動車株式会社,A01K,0.5,0.374916,0,71,112,-0.409268,1.113735,69.070423,54.348214
1,1981-1990,いすゞ自動車株式会社,A23K,0.5,1.03834,1,71,63,-0.409268,2.838075,69.070423,50.380952
2,1981-1990,いすゞ自動車株式会社,A47C,4.0,2.743898,1,71,42,-0.409268,-0.036169,69.070423,54.166667
3,1981-1990,いすゞ自動車株式会社,A61B,1.0,0.093671,0,71,49,-0.409268,-0.272298,69.070423,43.55102
4,1981-1990,いすゞ自動車株式会社,A61L,1.0,0.401533,0,71,157,-0.409268,1.605396,69.070423,51.095541


Unnamed: 0,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
count,45629.0,45629.0,45629.0,45629.0,45629.0,45629.0,45629.0,45629.0,45629.0
mean,17.080584,12.675098,0.635495,61.469438,87.849591,-0.059948,-0.097,80.665918,55.737957
std,90.471393,67.045899,0.481296,32.557831,46.353482,0.915388,1.302044,14.764003,9.638702
min,0.5,0.002319,0.0,1.0,1.0,-2.682195,-3.380596,19.0,19.0
25%,1.0,0.575773,0.0,38.0,53.0,-0.610713,-0.785301,70.0,49.302326
50%,3.0,1.789187,1.0,56.0,81.0,-0.101535,-0.161302,80.345455,53.527273
75%,8.0,6.147057,1.0,78.0,119.0,0.446641,0.655097,91.807692,59.595041
max,4969.0,3801.804829,1.0,197.0,218.0,2.672684,4.062958,144.6,135.0


<class 'pandas.core.frame.DataFrame'>
Index: 45629 entries, 0 to 45628
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   segment            45629 non-null  object 
 1   right_person_name  45629 non-null  object 
 2   ipc_class          45629 non-null  object 
 3   reg_num            45629 non-null  float64
 4   rca                45629 non-null  float64
 5   mcp                45629 non-null  int32  
 6   diversity          45629 non-null  int32  
 7   ubiquity           45629 non-null  int32  
 8   kci                45629 non-null  float64
 9   tci                45629 non-null  float64
 10  kh_1               45629 non-null  float64
 11  ki_1               45629 non-null  float64
dtypes: float64(6), int32(3), object(3)
memory usage: 4.0+ MB
None




Unnamed: 0,segment,right_person_name,ipc_class,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
0,1991-2000,いすゞ自動車株式会社,A23L,1.0,0.188398,0,72,138,-0.16855,2.329017,96.138889,37.92029
1,1991-2000,いすゞ自動車株式会社,A47H,1.0,6.098505,1,72,29,-0.16855,0.638004,96.138889,54.931034
2,1991-2000,いすゞ自動車株式会社,A61B,8.0,0.543683,0,72,73,-0.16855,-0.002019,96.138889,39.808219
3,1991-2000,いすゞ自動車株式会社,A61G,1.0,1.334627,1,72,86,-0.16855,-0.104236,96.138889,47.488372
4,1991-2000,いすゞ自動車株式会社,A61L,5.0,2.568793,1,72,215,-0.16855,1.327459,96.138889,46.930233


Unnamed: 0,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
count,58729.0,58729.0,58729.0,58729.0,58729.0,58729.0,58729.0,58729.0,58729.0
mean,16.755026,15.229258,0.64406,55.417664,113.728209,-0.034547,-0.05727,103.793581,49.955676
std,91.924141,79.06441,0.478801,31.401037,57.799293,0.902358,1.312662,18.052099,9.197423
min,0.33,0.003053,0.0,1.0,1.0,-3.538041,-4.080926,40.0,10.0
25%,1.0,0.592281,0.0,33.0,71.0,-0.506947,-0.647784,91.608696,44.393939
50%,2.0,1.925037,1.0,48.0,104.0,0.023659,-0.002019,104.273585,48.470149
75%,8.0,6.947244,1.0,70.0,154.0,0.454468,0.735044,116.181818,53.348958
max,5272.0,4895.55194,1.0,191.0,252.0,2.507241,3.819381,165.285714,126.833333


<class 'pandas.core.frame.DataFrame'>
Index: 58729 entries, 0 to 58728
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   segment            58729 non-null  object 
 1   right_person_name  58729 non-null  object 
 2   ipc_class          58729 non-null  object 
 3   reg_num            58729 non-null  float64
 4   rca                58729 non-null  float64
 5   mcp                58729 non-null  int32  
 6   diversity          58729 non-null  int32  
 7   ubiquity           58729 non-null  int32  
 8   kci                58729 non-null  float64
 9   tci                58729 non-null  float64
 10  kh_1               58729 non-null  float64
 11  ki_1               58729 non-null  float64
dtypes: float64(6), int32(3), object(3)
memory usage: 5.2+ MB
None




Unnamed: 0,segment,right_person_name,ipc_class,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
0,2001-2010,いすゞ自動車株式会社,A61B,5.0,0.258625,0,71,105,-0.161956,0.374433,94.605634,47.142857
1,2001-2010,いすゞ自動車株式会社,A61G,2.0,3.207465,1,71,109,-0.161956,0.136166,94.605634,50.642202
2,2001-2010,いすゞ自動車株式会社,A61L,1.0,0.620449,0,71,196,-0.161956,1.045828,94.605634,48.326531
3,2001-2010,いすゞ自動車株式会社,B01D,11.0,1.796983,1,71,249,-0.161956,0.844624,94.605634,49.618474
4,2001-2010,いすゞ自動車株式会社,B01J,2.0,0.329783,0,71,253,-0.161956,0.995135,94.605634,52.6917


Unnamed: 0,reg_num,rca,mcp,diversity,ubiquity,kci,tci,kh_1,ki_1
count,58721.0,58721.0,58721.0,58721.0,58721.0,58721.0,58721.0,58721.0,58721.0
mean,21.167638,18.717396,0.640554,52.325999,119.901466,0.070664,0.029265,109.235209,48.123177
std,123.470928,116.412698,0.479842,30.885782,62.557166,0.684791,1.027537,19.599673,9.128106
min,0.33,0.001163,0.0,1.0,1.0,-6.748853,-6.748853,40.0,8.911111
25%,1.0,0.559769,0.0,29.0,73.0,-0.285988,-0.536334,95.545455,42.282828
50%,2.0,1.955803,1.0,47.0,115.0,0.152109,0.102127,109.431034,47.776042
75%,8.0,7.346203,1.0,66.0,165.0,0.496537,0.699892,122.116279,52.348214
max,7803.5,8631.839583,1.0,179.0,289.0,1.770423,2.791817,187.764706,179.0


<class 'pandas.core.frame.DataFrame'>
Index: 58721 entries, 0 to 58720
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   segment            58721 non-null  object 
 1   right_person_name  58721 non-null  object 
 2   ipc_class          58721 non-null  object 
 3   reg_num            58721 non-null  float64
 4   rca                58721 non-null  float64
 5   mcp                58721 non-null  int32  
 6   diversity          58721 non-null  int32  
 7   ubiquity           58721 non-null  int32  
 8   kci                58721 non-null  float64
 9   tci                58721 non-null  float64
 10  kh_1               58721 non-null  float64
 11  ki_1               58721 non-null  float64
dtypes: float64(6), int32(3), object(3)
memory usage: 5.2+ MB
None




<a href="#top">先頭に戻る</a>

---


<a id=output></a>

## **5. ファイルに出力**

<a id=rightperson></a>

### **5.1. 特許権者**


In [12]:
# 全体
all_right_person_df = pd.merge(all_c_df.groupby(['segment', 'right_person_name'])[['reg_num']].sum().reset_index(drop=False), 
                               all_c_df.groupby(['segment', 'right_person_name'])[['ipc_class']].nunique().reset_index(drop=False), 
                               on=['segment', 'right_person_name'], 
                               how='inner')
all_right_person_df = pd.merge(all_right_person_df, 
                               all_c_df[['segment', 'right_person_name', 'diversity', 'kh_1', 'kci']].drop_duplicates(keep='first'), 
                               on=['segment', 'right_person_name'], 
                               how='inner')
all_right_person_df['reg_num'] = all_right_person_df['reg_num'].astype(np.int64)
display(all_right_person_df)

# 各期間
sep_year_right_person_df = pd.merge(sep_year_c_df.groupby(['segment', 'right_person_name'])[['reg_num']].sum().reset_index(drop=False), 
                               sep_year_c_df.groupby(['segment', 'right_person_name'])[['ipc_class']].nunique().reset_index(drop=False), 
                               on=['segment', 'right_person_name'], 
                               how='inner')
sep_year_right_person_df = pd.merge(sep_year_right_person_df, 
                               sep_year_c_df[['segment', 'right_person_name', 'diversity', 'kh_1', 'kci']].drop_duplicates(keep='first'), 
                               on=['segment', 'right_person_name'], 
                               how='inner')
sep_year_right_person_df['reg_num'] = sep_year_right_person_df['reg_num'].astype(np.int64)
display(sep_year_right_person_df)

right_person_df = pd.concat([all_right_person_df, sep_year_right_person_df], axis='index')
display(right_person_df)


Unnamed: 0,segment,right_person_name,reg_num,ipc_class,diversity,kh_1,kci
0,1981-2010,あすか製薬株式会社,181,19,17,254.000000,1.667732
1,1981-2010,いすゞ自動車株式会社,4255,166,80,165.750000,-0.425131
2,1981-2010,しげる工業株式会社,141,25,23,228.869565,0.044037
3,1981-2010,ぺんてる株式会社,1618,100,52,195.307692,0.334401
4,1981-2010,みのる産業株式会社,208,23,18,209.666667,0.612243
...,...,...,...,...,...,...,...
1933,1981-2010,ＵＢＥ株式会社,6004,231,95,187.052632,0.945821
1934,1981-2010,ＵＤトラツクス株式会社,1594,122,71,156.619718,-0.615255
1935,1981-2010,ＹＫＫ株式会社,1024,109,63,171.507937,0.372302
1936,1981-2010,ＹＫＫＡＰ株式会社,1657,65,32,214.750000,0.210663


Unnamed: 0,segment,right_person_name,reg_num,ipc_class,diversity,kh_1,kci
0,1981-1990,いすゞ自動車株式会社,1466,116,71,69.070423,-0.409268
1,1981-1990,ぺんてる株式会社,590,61,43,95.325581,0.362426
2,1981-1990,アイシン化工株式会社,156,27,24,112.125000,0.684015
3,1981-1990,アイジー工業株式会社,707,34,24,110.291667,0.339500
4,1981-1990,アイダエンジニアリング株式会社,270,39,33,100.848485,-0.559234
...,...,...,...,...,...,...,...
2821,2001-2010,ＵＢＥ株式会社,1433,97,63,116.031746,0.787791
2822,2001-2010,ＵＤトラツクス株式会社,810,85,59,94.966102,-0.234541
2823,2001-2010,ＹＫＫ株式会社,384,44,31,88.870968,0.468774
2824,2001-2010,ＹＫＫＡＰ株式会社,741,34,19,121.210526,0.223271


Unnamed: 0,segment,right_person_name,reg_num,ipc_class,diversity,kh_1,kci
0,1981-2010,あすか製薬株式会社,181,19,17,254.000000,1.667732
1,1981-2010,いすゞ自動車株式会社,4255,166,80,165.750000,-0.425131
2,1981-2010,しげる工業株式会社,141,25,23,228.869565,0.044037
3,1981-2010,ぺんてる株式会社,1618,100,52,195.307692,0.334401
4,1981-2010,みのる産業株式会社,208,23,18,209.666667,0.612243
...,...,...,...,...,...,...,...
2821,2001-2010,ＵＢＥ株式会社,1433,97,63,116.031746,0.787791
2822,2001-2010,ＵＤトラツクス株式会社,810,85,59,94.966102,-0.234541
2823,2001-2010,ＹＫＫ株式会社,384,44,31,88.870968,0.468774
2824,2001-2010,ＹＫＫＡＰ株式会社,741,34,19,121.210526,0.223271


In [13]:
right_person_df.to_csv(f'../Data/0_RightPerson/{ar}_{year_start}_{year_end}.csv', 
                       encoding='utf-8', 
                       sep=',', 
                       index=False)


<a href=#top>先頭に戻る</a>

---


<a id=ipc></a>

### **5.2. IPC**


In [14]:
# 全体
all_ipc_df = pd.merge(all_c_df.groupby(['segment', 'ipc_class'])[['reg_num']].sum().reset_index(drop=False), 
                        all_c_df.groupby(['segment', 'ipc_class'])[['right_person_name']].nunique().reset_index(drop=False), 
                        on=['segment', 'ipc_class'], 
                        how='inner')
all_ipc_df = pd.merge(all_ipc_df, 
                      all_c_df[['segment', 'ipc_class', 'ubiquity', 'ki_1', 'tci']].drop_duplicates(keep='first'), 
                      on=['segment', 'ipc_class'], 
                      how='inner')
all_ipc_df['reg_num'] = all_ipc_df['reg_num'].astype(np.int64)
display(all_ipc_df)

# 各期間
sep_year_ipc_df = pd.merge(sep_year_c_df.groupby(['segment', 'ipc_class'])[['reg_num']].sum().reset_index(drop=False), 
                        sep_year_c_df.groupby(['segment', 'ipc_class'])[['right_person_name']].nunique().reset_index(drop=False), 
                        on=['segment', 'ipc_class'], 
                        how='inner')
sep_year_ipc_df = pd.merge(sep_year_ipc_df, 
                      sep_year_c_df[['segment', 'ipc_class', 'ubiquity', 'ki_1', 'tci']].drop_duplicates(keep='first'), 
                      on=['segment', 'ipc_class'], 
                      how='inner')
sep_year_ipc_df['reg_num'] = sep_year_ipc_df['reg_num'].astype(np.int64)
display(sep_year_ipc_df)


ipc_df = pd.concat([all_ipc_df, sep_year_ipc_df], axis='index')
display(ipc_df)


Unnamed: 0,segment,ipc_class,reg_num,right_person_name,ubiquity,ki_1,tci
0,1981-2010,A01B,4528,65,48,48.729167,0.374633
1,1981-2010,A01C,6295,131,76,45.092105,1.160517
2,1981-2010,A01D,5836,80,56,47.089286,0.566273
3,1981-2010,A01F,3103,86,53,50.169811,0.750052
4,1981-2010,A01G,4821,581,445,49.741573,1.070829
...,...,...,...,...,...,...,...
622,1981-2010,H05G,642,57,49,57.857143,-0.695108
623,1981-2010,H05H,2738,217,151,67.039735,-0.405784
624,1981-2010,H05K,31595,813,284,40.619718,-1.497781
625,1981-2010,H10B,4,1,1,22.000000,-1.972941


Unnamed: 0,segment,ipc_class,reg_num,right_person_name,ubiquity,ki_1,tci
0,1981-1990,A01B,1177,18,15,50.533333,-0.139177
1,1981-1990,A01C,1818,43,21,49.523810,0.998249
2,1981-1990,A01D,1343,30,20,58.650000,-0.133186
3,1981-1990,A01F,800,33,25,53.320000,0.789687
4,1981-1990,A01G,1007,190,148,53.277027,1.098426
...,...,...,...,...,...,...,...
1820,2001-2010,H05G,247,33,30,56.100000,0.127064
1821,2001-2010,H05H,899,112,91,64.109890,0.209636
1822,2001-2010,H05K,15062,490,180,38.205556,-0.620678
1823,2001-2010,H10B,4,1,1,18.000000,-2.103911


Unnamed: 0,segment,ipc_class,reg_num,right_person_name,ubiquity,ki_1,tci
0,1981-2010,A01B,4528,65,48,48.729167,0.374633
1,1981-2010,A01C,6295,131,76,45.092105,1.160517
2,1981-2010,A01D,5836,80,56,47.089286,0.566273
3,1981-2010,A01F,3103,86,53,50.169811,0.750052
4,1981-2010,A01G,4821,581,445,49.741573,1.070829
...,...,...,...,...,...,...,...
1820,2001-2010,H05G,247,33,30,56.100000,0.127064
1821,2001-2010,H05H,899,112,91,64.109890,0.209636
1822,2001-2010,H05K,15062,490,180,38.205556,-0.620678
1823,2001-2010,H10B,4,1,1,18.000000,-2.103911


In [15]:
ipc_df.to_csv(f'../Data/0_IPC/{ar}_{year_start}_{year_end}.csv', 
                encoding='utf-8', 
                sep=',', 
                index=False)


<a href=#top>先頭に戻る</a>

---


<a id=network></a>

## **5.3. 二部グラフ用**


In [16]:
graph_df = pd.concat([all_c_df, sep_year_c_df], axis='index')
graph_df = graph_df[graph_df['mcp']==1][['segment', 'right_person_name', 'ipc_class', 'mcp']]
graph_df

Unnamed: 0,segment,right_person_name,ipc_class,mcp
0,1981-2010,あすか製薬株式会社,A23D,1
1,1981-2010,あすか製薬株式会社,A23K,1
2,1981-2010,あすか製薬株式会社,A23L,1
3,1981-2010,あすか製薬株式会社,A47C,1
5,1981-2010,あすか製薬株式会社,A61G,1
...,...,...,...,...
58712,2001-2010,Ｚホールデイングス株式会社,G10K,1
58713,2001-2010,Ｚホールデイングス株式会社,G10L,1
58715,2001-2010,Ｚホールデイングス株式会社,H03M,1
58716,2001-2010,Ｚホールデイングス株式会社,H04H,1


In [17]:
graph_df.to_csv(f'../Data/0_Graph/{ar}_{year_start}_{year_end}.csv', 
                encoding='utf-8', 
                sep=',', 
                index=False)
graph_df


Unnamed: 0,segment,right_person_name,ipc_class,mcp
0,1981-2010,あすか製薬株式会社,A23D,1
1,1981-2010,あすか製薬株式会社,A23K,1
2,1981-2010,あすか製薬株式会社,A23L,1
3,1981-2010,あすか製薬株式会社,A47C,1
5,1981-2010,あすか製薬株式会社,A61G,1
...,...,...,...,...
58712,2001-2010,Ｚホールデイングス株式会社,G10K,1
58713,2001-2010,Ｚホールデイングス株式会社,G10L,1
58715,2001-2010,Ｚホールデイングス株式会社,H03M,1
58716,2001-2010,Ｚホールデイングス株式会社,H04H,1
