## 每季調整成分股

### 成分股編制方式

>1. 實收資本額80億元以下或是市值350億元以下。
>
>2. 符合1.資格的成分股進行市值排名；有排名的成分股再進行同產業內的市值排名。(順序很重要)
>
>3. 取市值前400名，以及產業內的市值排名第一名。
>
>4. 完成成分股篩選
>
>5. 要特別注意，可能有警示股票以及因重大消息暫停交易股票，可能會因此被淘汰，需要確認消息來更新資料庫

In [1]:
import pandas as pd
import numpy as np

### 第一步:讀取實收資本額資料。

In [2]:
Capital_and_Industry_df = pd.read_excel('上市櫃普通股(不含DR)基本資料(每季更新).xlsx',sheet_name='實收資本額以及產業別')
Capital_and_Industry_df

Unnamed: 0,公司簡稱,實收資本額(元),TSE新產業_名稱
0,1101 台泥,77511817420,水泥工業
1,1102 亞泥,35465628810,水泥工業
2,1103 嘉泥,7902474590,水泥工業
3,1104 環泥,6866818160,水泥工業
4,1108 幸福,4047380000,水泥工業
...,...,...,...
1855,9951 皇田,749000000,電機機械
1856,9955 佳龍,1045137290,綠能環保
1857,9958 世紀鋼,2568966650,鋼鐵工業
1858,9960 邁達康,335925000,運動休閒


> 稍微更改一下欄位名稱，方便合併操作

In [3]:
Capital_and_Industry_df = Capital_and_Industry_df.rename(columns={'公司簡稱':'證券代碼'})
Capital_and_Industry_df

Unnamed: 0,證券代碼,實收資本額(元),TSE新產業_名稱
0,1101 台泥,77511817420,水泥工業
1,1102 亞泥,35465628810,水泥工業
2,1103 嘉泥,7902474590,水泥工業
3,1104 環泥,6866818160,水泥工業
4,1108 幸福,4047380000,水泥工業
...,...,...,...
1855,9951 皇田,749000000,電機機械
1856,9955 佳龍,1045137290,綠能環保
1857,9958 世紀鋼,2568966650,鋼鐵工業
1858,9960 邁達康,335925000,運動休閒


### 第二步:自行計算各股市值。

In [4]:
Market_Value_df = pd.read_excel('上市櫃普通股(不含DR)基本資料(每季更新).xlsx',sheet_name='自行計算市值')
Market_Value_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),收盤價(元)
0,1101 台泥,2025-02-27,7551182,35.10
1,1102 亞泥,2025-02-27,3546563,41.80
2,1103 嘉泥,2025-02-27,790247,16.35
3,1104 環泥,2025-02-27,686682,30.30
4,1108 幸福,2025-02-27,404738,15.05
...,...,...,...,...
1853,9951 皇田,2025-02-27,74900,59.00
1854,9955 佳龍,2025-02-27,104514,30.20
1855,9958 世紀鋼,2025-02-27,256897,187.00
1856,9960 邁達康,2025-02-27,33593,26.30


> 市值等於流通在外股數(千股)乘以未調整收盤價(需自行計算)

In [5]:
Market_Value_df = Market_Value_df.rename(columns={'收盤價(元)':'未調整收盤價(元)'})
Market_Value_df['市值(元)'] = (Market_Value_df['流通在外股數(千股)']  * Market_Value_df['未調整收盤價(元)']  * 1000)
Market_Value_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),市值(元)
0,1101 台泥,2025-02-27,7551182,35.10,2.650465e+11
1,1102 亞泥,2025-02-27,3546563,41.80,1.482463e+11
2,1103 嘉泥,2025-02-27,790247,16.35,1.292054e+10
3,1104 環泥,2025-02-27,686682,30.30,2.080646e+10
4,1108 幸福,2025-02-27,404738,15.05,6.091307e+09
...,...,...,...,...,...
1853,9951 皇田,2025-02-27,74900,59.00,4.419100e+09
1854,9955 佳龍,2025-02-27,104514,30.20,3.156323e+09
1855,9958 世紀鋼,2025-02-27,256897,187.00,4.803974e+10
1856,9960 邁達康,2025-02-27,33593,26.30,8.834959e+08


### `第三步:取得暫停交易股票資訊。`

In [6]:
Suspend_trading_df = pd.read_excel('上市櫃普通股(不含DR)基本資料(每季更新).xlsx',sheet_name='換股日暫停交易股票')
Suspend_trading_df

Unnamed: 0,証券代碼,暫停交易起日,暫停交易原因,恢復交易日


### `第四步:合併一、二，並去除三的股票，得到All_Stock_df`

In [7]:
All_Stock_df = pd.merge(Capital_and_Industry_df , Market_Value_df , on='證券代碼')

All_Stock_df = All_Stock_df[~All_Stock_df['證券代碼'].isin(Suspend_trading_df['証券代碼'])]

## 稍微排序All_Stock_df，比較好看
All_Stock_df = All_Stock_df[['證券代碼','年月日','流通在外股數(千股)','未調整收盤價(元)','TSE新產業_名稱','實收資本額(元)','市值(元)']]
All_Stock_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),TSE新產業_名稱,實收資本額(元),市值(元)
0,1101 台泥,2025-02-27,7551182,35.10,水泥工業,77511817420,2.650465e+11
1,1102 亞泥,2025-02-27,3546563,41.80,水泥工業,35465628810,1.482463e+11
2,1103 嘉泥,2025-02-27,790247,16.35,水泥工業,7902474590,1.292054e+10
3,1104 環泥,2025-02-27,686682,30.30,水泥工業,6866818160,2.080646e+10
4,1108 幸福,2025-02-27,404738,15.05,水泥工業,4047380000,6.091307e+09
...,...,...,...,...,...,...,...
1853,9951 皇田,2025-02-27,74900,59.00,電機機械,749000000,4.419100e+09
1854,9955 佳龍,2025-02-27,104514,30.20,綠能環保,1045137290,3.156323e+09
1855,9958 世紀鋼,2025-02-27,256897,187.00,鋼鐵工業,2568966650,4.803974e+10
1856,9960 邁達康,2025-02-27,33593,26.30,運動休閒,335925000,8.834959e+08


### 第四步:根據條件篩選成分股

In [8]:
# 實收資本額80億元以下或是市值350億元以下。
All_Stock_df['是否符合中小型股特性'] = np.where((All_Stock_df['實收資本額(元)'] <= 8000000000) | (All_Stock_df['市值(元)'] <= 35000000000),"有符合特性",'無')

# 符合資格的成分股進行市值排名
All_Stock_df['中小型股池內的市值排名'] = All_Stock_df.loc[All_Stock_df['是否符合中小型股特性'] == '有符合特性' , '市值(元)'].rank(ascending=False)
All_Stock_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),TSE新產業_名稱,實收資本額(元),市值(元),是否符合中小型股特性,中小型股池內的市值排名
0,1101 台泥,2025-02-27,7551182,35.10,水泥工業,77511817420,2.650465e+11,無,
1,1102 亞泥,2025-02-27,3546563,41.80,水泥工業,35465628810,1.482463e+11,無,
2,1103 嘉泥,2025-02-27,790247,16.35,水泥工業,7902474590,1.292054e+10,有符合特性,417.0
3,1104 環泥,2025-02-27,686682,30.30,水泥工業,6866818160,2.080646e+10,有符合特性,264.0
4,1108 幸福,2025-02-27,404738,15.05,水泥工業,4047380000,6.091307e+09,有符合特性,800.0
...,...,...,...,...,...,...,...,...,...
1853,9951 皇田,2025-02-27,74900,59.00,電機機械,749000000,4.419100e+09,有符合特性,966.0
1854,9955 佳龍,2025-02-27,104514,30.20,綠能環保,1045137290,3.156323e+09,有符合特性,1144.0
1855,9958 世紀鋼,2025-02-27,256897,187.00,鋼鐵工業,2568966650,4.803974e+10,有符合特性,93.0
1856,9960 邁達康,2025-02-27,33593,26.30,運動休閒,335925000,8.834959e+08,有符合特性,1669.0


In [9]:
# 得到股池
Stock_Pool_df = All_Stock_df.dropna()
Stock_Pool_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),TSE新產業_名稱,實收資本額(元),市值(元),是否符合中小型股特性,中小型股池內的市值排名
2,1103 嘉泥,2025-02-27,790247,16.35,水泥工業,7902474590,1.292054e+10,有符合特性,417.0
3,1104 環泥,2025-02-27,686682,30.30,水泥工業,6866818160,2.080646e+10,有符合特性,264.0
4,1108 幸福,2025-02-27,404738,15.05,水泥工業,4047380000,6.091307e+09,有符合特性,800.0
5,1109 信大,2025-02-27,341159,17.95,水泥工業,3411588680,6.123804e+09,有符合特性,795.0
6,1110 東泥,2025-02-27,572001,20.05,水泥工業,5720007970,1.146862e+10,有符合特性,474.0
...,...,...,...,...,...,...,...,...,...
1853,9951 皇田,2025-02-27,74900,59.00,電機機械,749000000,4.419100e+09,有符合特性,966.0
1854,9955 佳龍,2025-02-27,104514,30.20,綠能環保,1045137290,3.156323e+09,有符合特性,1144.0
1855,9958 世紀鋼,2025-02-27,256897,187.00,鋼鐵工業,2568966650,4.803974e+10,有符合特性,93.0
1856,9960 邁達康,2025-02-27,33593,26.30,運動休閒,335925000,8.834959e+08,有符合特性,1669.0


In [10]:
# 進行股池產業內市值排名。
Stock_Pool_df['股池內的同產業市值排名'] = Stock_Pool_df.groupby('TSE新產業_名稱')['市值(元)'].rank(ascending=False)

# 稍微排序一下看有沒有問題
Stock_Pool_df = Stock_Pool_df.sort_values(by='中小型股池內的市值排名',ascending=True)
Stock_Pool_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Stock_Pool_df['股池內的同產業市值排名'] = Stock_Pool_df.groupby('TSE新產業_名稱')['市值(元)'].rank(ascending=False)


Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),TSE新產業_名稱,實收資本額(元),市值(元),是否符合中小型股特性,中小型股池內的市值排名,股池內的同產業市值排名
330,2357 華碩,2025-02-27,742760,682.00,電腦及週邊,7427602800,5.065623e+11,有符合特性,1.0,1.0
320,2345 智邦,2025-02-27,561118,673.00,通信網路業,5611178970,3.776324e+11,有符合特性,2.0,1.0
1513,6669 緯穎,2025-02-27,185841,1965.00,電腦及週邊,1858407910,3.651776e+11,有符合特性,3.0,2.0
576,3008 大立光,2025-02-27,133468,2700.00,光電業,1334681970,3.603636e+11,有符合特性,4.0,1.0
280,2207 和泰車,2025-02-27,557103,620.00,汽車工業,5571027680,3.454039e+11,有符合特性,5.0,1.0
...,...,...,...,...,...,...,...,...,...,...
1675,8067 志旭,2025-02-27,20700,16.75,電子通路業,207000000,3.467250e+08,有符合特性,1748.0,32.0
928,4304 勝昱,2025-02-27,36038,9.00,塑膠工業,360380650,3.243420e+08,有符合特性,1749.0,22.0
700,3288 點晶,2025-02-27,15000,20.75,電子零組件,150000000,3.112500e+08,有符合特性,1750.0,203.0
1031,4804 大略-KY,2025-02-27,49003,5.31,觀光餐旅,490029200,2.602059e+08,有符合特性,1751.0,48.0


In [11]:
# 完成成分股篩選
Small_Mid_Cap_df = Stock_Pool_df[(Stock_Pool_df['中小型股池內的市值排名'] <= 400) | (Stock_Pool_df['股池內的同產業市值排名'] == 1)] 
Small_Mid_Cap_df

Unnamed: 0,證券代碼,年月日,流通在外股數(千股),未調整收盤價(元),TSE新產業_名稱,實收資本額(元),市值(元),是否符合中小型股特性,中小型股池內的市值排名,股池內的同產業市值排名
330,2357 華碩,2025-02-27,742760,682.00,電腦及週邊,7427602800,5.065623e+11,有符合特性,1.0,1.0
320,2345 智邦,2025-02-27,561118,673.00,通信網路業,5611178970,3.776324e+11,有符合特性,2.0,1.0
1513,6669 緯穎,2025-02-27,185841,1965.00,電腦及週邊,1858407910,3.651776e+11,有符合特性,3.0,2.0
576,3008 大立光,2025-02-27,133468,2700.00,光電業,1334681970,3.603636e+11,有符合特性,4.0,1.0
280,2207 和泰車,2025-02-27,557103,620.00,汽車工業,5571027680,3.454039e+11,有符合特性,5.0,1.0
...,...,...,...,...,...,...,...,...,...,...
1744,8374 羅昇,2025-02-27,112250,122.00,電機機械,1122504870,1.369450e+10,有符合特性,398.0,14.0
127,1532 勤美,2025-02-27,422604,32.30,電機機械,4226042710,1.365011e+10,有符合特性,399.0,15.0
474,2617 台航,2025-02-27,417294,32.55,航運業,4172944870,1.358292e+10,有符合特性,400.0,9.0
218,1810 和成,2025-02-27,302304,17.15,玻璃陶瓷,3023037190,5.184514e+09,有符合特性,878.0,1.0


### 第五步: 調整一下資料，方便儲存成新的Base_data

In [12]:
Season_Base_data = Small_Mid_Cap_df[['年月日','證券代碼','流通在外股數(千股)','未調整收盤價(元)']]
Season_Base_data.columns = ['年月日','證券代碼','流通在外股數','收盤價(元)']
Season_Base_data

Unnamed: 0,年月日,證券代碼,流通在外股數,收盤價(元)
330,2025-02-27,2357 華碩,742760,682.00
320,2025-02-27,2345 智邦,561118,673.00
1513,2025-02-27,6669 緯穎,185841,1965.00
576,2025-02-27,3008 大立光,133468,2700.00
280,2025-02-27,2207 和泰車,557103,620.00
...,...,...,...,...
1744,2025-02-27,8374 羅昇,112250,122.00
127,2025-02-27,1532 勤美,422604,32.30
474,2025-02-27,2617 台航,417294,32.55
218,2025-02-27,1810 和成,302304,17.15


In [13]:
Season_Base_data.to_excel('Base_Data.xlsx',index=False)