<a href="https://colab.research.google.com/github/angel870326/Monthly-Revenue-Forecasting/blob/main/022_search_volume_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> 2023.06.10 Ssu-Yun Wang<br/>
[Github @angel870326](https://github.com/angel870326)

# **Variable for Monthly Revenue Forecasting - Search Volumn (2013-2022)**

### Contents

1. Get Company List
  *  1.1 Read Data
  *  1.2 Create Company List
2. Calculate Search Volumn Score
  *  2.0 Setup
  *  2.1 Class & Methods
  *  2.2 Individual Company (e.g. 2330 台積電 & 1110 東泥)
  *  2.3 All the Companies
  *  2.4 Output



## **0. Setup**

In [1]:
# sConnect to the Google Drive
from google.colab import drive
drive.mount("/content/gdrive")

Mounted at /content/gdrive


In [2]:
import os
import pandas as pd
import numpy as np

### **Project Path**

In [3]:
project_path = '/content/gdrive/Shareddrives/Me/論文'

## **1. Get Company List**


### **1.1 Read Data**

【**月營收盈餘 (2013-2022)**】

資料期間：2013年1月至2022年12月（共120個月）

資料範圍：上市、櫃公司（排除金融業、生技醫療、建材營造、DR和KY公司）

資料來源：TEJ Company DB、公開資訊觀測站


In [4]:
org_data = pd.read_excel(os.path.join(project_path, '資料集/007_v1/201301-202212上市櫃公司月營收_非金融業.xlsx'), index_col=0)
org_data.columns = pd.to_datetime(org_data.columns, format="%Y-%m-%d").to_period('M')
org_data

Unnamed: 0_level_0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
公司,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1101 台泥,9134465,5540346,9457971,9919269,9543782,9517630,9875888,9835143,10060975,10654077,...,9971650,8319342,7733787,9145989,10102468,10689860,10404901,11368096,9674576,12584154
1102 亞泥,6018213,2552357,5428755,5930748,6239676,5952754,5942364,5786107,5879394,6478670,...,8160414,8710220,8000427,7776413,7864622,7069221,6994078,7601097,8306062,8340507
1103 嘉泥,288455,166638,286007,365292,382601,302995,294781,336088,314563,429783,...,220463,168089,163521,183177,178825,182371,205264,209429,221763,228644
1104 環泥,486481,299860,461732,394631,406677,415968,453397,393203,448691,521445,...,591593,638493,537082,573028,580420,605512,597159,634981,631827,725055
1108 幸福,481802,276936,444917,362054,381384,368109,439572,379115,387362,450770,...,345612,335518,332258,334113,326691,390053,346635,401202,383773,418326
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9951 皇田,201785,167967,240746,243935,238296,193880,198427,256724,228796,250756,...,374229,302262,323433,371791,337581,468608,464373,432835,500111,506796
9955 佳龍,394489,383183,428478,564053,336622,295391,434605,306534,266617,363766,...,96200,101850,95096,80726,85625,81881,79179,80630,91270,84115
9958 世紀鋼,198944,166364,351222,280864,289332,426371,213281,302589,401695,255738,...,626104,401960,673479,665459,651699,757968,903198,911834,944060,1082675
9960 邁達康,52534,41935,61642,70998,81508,64525,62085,60960,60309,61582,...,60275,86754,69752,103280,64983,105969,113755,78996,96570,58764


In [5]:
print("Data shape:", org_data.shape)
print("Data size:", org_data.size)

Data shape: (1240, 120)
Data size: 148800


### **1.2 Create Company List**

In [6]:
# Company list
company_list = org_data.index
company_list

Index(['1101 台泥', '1102 亞泥', '1103 嘉泥', '1104 環泥', '1108 幸福', '1109 信大',
       '1110 東泥', '1201 味全', '1203 味王', '1210 大成',
       ...
       '9943 好樂迪', '9944 新麗', '9945 潤泰新', '9949 琉園', '9950 萬國通', '9951 皇田',
       '9955 佳龍', '9958 世紀鋼', '9960 邁達康', '9962 有益'],
      dtype='object', name='公司', length=1240)

## **2. Calculate Search Volumn Score**

### **2.0 Setup**

In [7]:
import time

In [8]:
trends_save_path = os.path.join(project_path, '資料集/google trends')

### **2.1 Class & Methods**

In [9]:
class MonthlySearchVolume():

    def __init__(self, company_list: list, year_start: int, year_end: int):
        self.company_list = company_list
        self.year_start = year_start
        self.year_end = year_end


    #-------------------- Individual Company -------------------- 

    def readCompanyTrends(self, index):
        file_name = self.company_list[index]
        trendsD = pd.read_csv(os.path.join(trends_save_path, f'monthly/{file_name}.csv'), index_col = 0).loc[['t-4', 't-3', 't-2', 't-1']]
        trendsD.columns = org_data.columns

        return trendsD


    def companySearchVolume(self, index, trendsD: pd.DataFrame = pd.DataFrame()):
        if trendsD.empty:
            trendsD = self.readCompanyTrends(index)

        if trendsD.isnull().all().all():    # company without trends data
            search_volume_1 = pd.DataFrame(0, columns = org_data.columns, index = [company_list[index]])
            search_volume_2 = pd.DataFrame(0, columns = org_data.columns, index = [company_list[index]])
        else:
            trendsD.loc['Mean'] = pd.Series(trendsD.mean())
            search_volume_1 = trendsD.loc[['Mean']]
            search_volume_1.index = [company_list[index]]
            search_volume_2 = trendsD.loc[['t-1']]
            search_volume_2.index = [company_list[index]]

        return search_volume_1, search_volume_2


   #-------------------- All the Companies -------------------- 

    def allCompanySearchVolume(self):
        start = time.time()

        all_search_volume_1 = pd.DataFrame(columns = org_data.columns)
        all_search_volume_2 = pd.DataFrame(columns = org_data.columns)

        for i in range(0, len(self.company_list), 1):
            search_volume_1, search_volume_2 = self.companySearchVolume(i)
            all_search_volume_1 = pd.concat([all_search_volume_1, search_volume_1])
            all_search_volume_2 = pd.concat([all_search_volume_2, search_volume_2])
        
        runtime = "%.3f"%(time.time() - start)
        print(f"Time spent: {runtime} sec.")
        
        return all_search_volume_1, all_search_volume_2


### **2.2 Individual Company (e.g. 2330 台積電 & 1110 東泥)**

In [10]:
mt = MonthlySearchVolume(company_list, 2013, 2022)

In [11]:
# 台積電
sv1, sv2 = mt.companySearchVolume(256)
display(sv1)
display(sv2)

Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
2330 台積電,69.0,70.25,35.5,88.0,44.5,35.75,68.0,47.0,55.0,60.75,...,47.75,46.75,62.75,88.5,56.25,67.0,45.25,34.5,88.0,45.5


Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
2330 台積電,86.0,94.0,35.0,89.0,25.0,44.0,96.0,67.0,53.0,66.0,...,78.0,87.0,55.0,84.0,83.0,100.0,41.0,48.0,94.0,75.0


In [12]:
# 東泥
sv1, sv2 = mt.companySearchVolume(6)
display(sv1)
display(sv2)

Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
1110 東泥,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
1110 東泥,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### **2.3 All the Companies (5 min.)**

In [13]:
mt = MonthlySearchVolume(company_list, 2013, 2022)
all_search_volume_1, all_search_volume_2 = mt.allCompanySearchVolume()

Time spent: 294.195 sec.


In [14]:
all_search_volume_1

Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
1101 台泥,42.0,42.50,0.00,25.00,25.50,13.00,25.00,0.00,8.00,10.50,...,40.00,36.00,62.00,41.75,41.25,46.25,51.00,36.75,70.75,54.75
1102 亞泥,34.5,0.00,27.50,20.75,20.50,27.75,13.75,22.50,13.75,4.50,...,36.50,30.50,56.25,38.00,32.50,37.50,58.25,26.25,31.75,65.25
1103 嘉泥,0.0,14.50,21.00,31.75,0.00,15.00,8.00,25.00,23.00,24.75,...,10.50,9.75,15.75,15.50,14.00,38.75,7.25,21.25,0.00,7.25
1104 環泥,25.0,0.00,40.00,0.00,0.00,36.25,0.00,0.00,25.00,25.00,...,6.75,33.50,27.00,0.00,0.00,0.00,20.75,5.75,32.00,9.25
1108 幸福,64.5,59.00,76.50,75.00,65.75,66.25,77.50,90.25,74.75,74.25,...,76.75,84.50,78.25,70.00,76.00,86.00,81.25,84.75,82.50,69.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9951 皇田,0.0,14.00,12.50,8.25,4.50,0.00,13.00,19.00,15.75,8.25,...,9.50,34.75,33.75,0.00,11.50,20.25,0.00,0.00,0.00,0.00
9955 佳龍,25.5,7.25,25.00,13.50,29.75,23.75,53.00,0.00,0.00,38.00,...,44.00,30.25,18.25,19.50,30.25,36.50,47.00,54.00,45.00,38.25
9958 世紀鋼,25.0,0.00,24.25,6.25,36.25,0.00,0.00,0.00,9.25,0.00,...,44.75,32.00,30.25,12.25,5.25,19.50,56.50,56.00,53.25,33.00
9960 邁達康,25.0,0.00,52.50,0.00,34.75,0.00,25.50,0.00,0.00,25.00,...,44.75,19.25,28.00,14.25,46.00,4.50,12.25,11.75,46.00,47.00


In [15]:
all_search_volume_2

Unnamed: 0,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,...,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
1101 台泥,57.0,66.0,0.0,0.0,0.0,27.0,100.0,0.0,0.0,0.0,...,20.0,53.0,66.0,19.0,58.0,30.0,62.0,57.0,79.0,73.0
1102 亞泥,85.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,54.0,52.0,78.0,74.0,70.0,50.0,81.0,49.0,0.0,100.0
1103 嘉泥,0.0,58.0,0.0,94.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,28.0,0.0,0.0,0.0,0.0,29.0
1104 環泥,0.0,0.0,96.0,0.0,0.0,45.0,0.0,0.0,0.0,0.0,...,0.0,0.0,54.0,0.0,0.0,0.0,0.0,0.0,58.0,0.0
1108 幸福,83.0,56.0,73.0,80.0,65.0,66.0,92.0,94.0,74.0,75.0,...,82.0,92.0,71.0,67.0,69.0,77.0,84.0,77.0,92.0,62.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9951 皇田,0.0,0.0,0.0,0.0,18.0,0.0,52.0,0.0,0.0,0.0,...,0.0,39.0,72.0,0.0,46.0,0.0,0.0,0.0,0.0,0.0
9955 佳龍,43.0,0.0,0.0,0.0,0.0,0.0,42.0,0.0,0.0,0.0,...,32.0,100.0,35.0,0.0,36.0,43.0,37.0,71.0,0.0,0.0
9958 世紀鋼,100.0,0.0,64.0,0.0,85.0,0.0,0.0,0.0,0.0,0.0,...,38.0,0.0,35.0,32.0,0.0,48.0,60.0,23.0,100.0,0.0
9960 邁達康,100.0,0.0,78.0,0.0,0.0,0.0,33.0,0.0,0.0,0.0,...,21.0,0.0,0.0,0.0,100.0,0.0,49.0,47.0,35.0,27.0


### **2.4 Output**

In [16]:
output_path = os.path.join(project_path, '資料集/search volume')

In [17]:
all_search_volume_1.to_csv(os.path.join(output_path, 'search_volume_mean.csv'))

In [18]:
all_search_volume_2.to_csv(os.path.join(output_path, 'search_volume_t-1.csv'))