#  Knowledge-Driven Financial Analysis

"Knowledge-Driven Financial Analysis" refers to a sophisticated approach to financial analysis that leverages extensive data and insights. It combines traditional financial metrics with advanced analytics, artificial intelligence, and machine learning techniques. This method utilizes a deep understanding of financial principles and market dynamics, aiming to provide more accurate, insightful, and predictive analyses of financial data and trends.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import os
import pandas as pd

# Specify the folder path and get a list of CSV files
folder_path = '/content/drive/MyDrive/Colab Notebooks/notebook/5002 data mining final project/KnowledgeGraph'
csv_files = [file for file in os.listdir(folder_path) if file.endswith('.csv') and file.startswith('hidy.relationships.')]

# Create an empty list to store DataFrames
dfs = []

# Iterate through each CSV file
for file in csv_files:
    file_path = os.path.join(folder_path, file)
    df = pd.read_csv(file_path)
    dfs.append(df)

# Concatenate into a single DataFrame
merged_news_data = pd.concat(dfs, ignore_index=True)

# Replace specific values in the ':TYPE' column
merged_news_data[':TYPE'].replace(['cooperate', 'invest', 'same_industry', 'supply'], 1, inplace=True)
merged_news_data[':TYPE'].replace(['compete', 'dispute'], 0, inplace=True)

del merged_news_data['time']
merged_news_data.to_csv('merged_data.csv', index=False)

In [3]:
df_hidy = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/notebook/5002 data mining final project/KnowledgeGraph/hidy.nodes.company.csv')

# Create mapping dictionaries
company_to_id = dict(zip(df_hidy['company_name'], df_hidy[':ID']))
id_to_company = dict(zip(df_hidy[':ID'], df_hidy['company_name']))

df_company = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/notebook/5002 data mining final project/result_df_with_sentiment.xlsx')
df_relationships = pd.read_csv('/content/merged_data.csv')

# Create a new column 'Explicit_ID' with company IDs
if 'Explicit_Company' in df_company.columns:
    df_company['Explicit_ID'] = df_company['Explicit_Company'].apply(lambda x: ','.join([str(company_to_id.get(name, '')) for name in x.split(', ') if name in company_to_id]))


The script is designed to analyze relationships between companies in a dataset. It identifies and categorizes implicit positive and negative connections among companies based on predefined criteria. These relationships are then recorded in an updated dataset, providing a detailed view of the network of interactions and influences between various companies.

In [5]:
# Initialize the new columns with default values
df_company['Implicit_Positive_Company'] = "None"
df_company['Implicit_Negative_Company'] = "None"

# Convert ':START_ID' column in df_relationships to integers
df_relationships[':START_ID'] = pd.to_numeric(df_relationships[':START_ID'], errors='coerce').astype('Int64')

for index, row in df_company.iterrows():
    start_ids = [int(id) for id in row['Explicit_ID'].split(',') if id.strip()]
    type_value = row['label']
    imp_pos = []
    imp_neg = []

    for start_id in start_ids:
        matching_rows = df_relationships[df_relationships[':START_ID'] == start_id]

        for _, relation in matching_rows.iterrows():
            end_id = relation[':END_ID']
            if type_value == 0 and relation[':TYPE'] == 0:
                result = 1
            elif type_value == 0 and relation[':TYPE'] == 1:
                result = 0
            elif type_value == 1 and relation[':TYPE'] == 0:
                result = 0
            elif type_value == 1 and relation[':TYPE'] == 1:
                result = 1
      

            if result:
                imp_pos.append(end_id)
            else:
                imp_neg.append(end_id)

    # Convert IDs to company names and join them into a string
    imp_pos_str = ', '.join([id_to_company.get(ids, '') for ids in imp_pos])
    imp_neg_str = ', '.join([id_to_company.get(ids, '') for ids in imp_neg])

    # Assign the resulting strings to the corresponding columns
    if imp_pos_str:
        df_company.at[index, 'Implicit_Positive_Company'] = imp_pos_str

    if imp_neg_str:
        df_company.at[index, 'Implicit_Negative_Company'] = imp_neg_str

# The df_company DataFrame now contains the updated Implicit_Positive_Company and Implicit_Negative_Company columns.
df_company

Unnamed: 0,NewsID,NewsContent,Explicit_Company,label,Explicit_ID,Implicit_Positive_Company,Implicit_Negative_Company
0,1,本报记者 田雨 李京华 中国建设银行股份有限公司原董事长张恩照受贿案３日一审宣...,建设银行,0,3014,,"比亚迪, 中国银行, 中国铁建, 上海银行, 新华联, 中国船舶, 招商证券, 来伊份, 我..."
1,2,中国农业银行信用卡中心由北京搬到上海了！ 农行行长杨明生日前在信用卡中心揭牌仪式上...,农业银行,1,2914,"京东方, 中国银行, 金风科技, 邮储银行, 中国船舶, 交通银行, 建设银行, 金地集团,...",
2,3,在新基金快速发行以及申购资金回流的情况下，市场总体上呈现资金流动性过剩格局，考虑到现阶段...,"外运发展, 中国国航",1,933809,"中国外运, 中国外运, 南方航空, 南方航空, 吉祥航空, 春秋航空, 工商银行, 中国交建...",
3,4,胜利股份（000407）公司子公司填海造地2800亩，以青岛的地价估算，静态价值在10亿...,胜利股份,1,2878,特锐德,
4,8,由于全球最大的俄罗斯Uralkaly钾矿被淹，产量大减，同时满洲里口岸铁路在修复线，导致...,冠农股份,1,2799,"藏格控股, 富邦股份",
...,...,...,...,...,...,...,...
187293,1037007,10月13日，今日共有43只涨停股，5只跌停股。其中，涨停股主要集中在华为概念股、减肥药概念...,"模塑科技, 龙版传媒, 莎普爱思, 光洋股份, 通化金马, 圣龙股份, 通宇通讯, 欧菲光",0,54223345723460534355732202471,"江苏银行, 京东方, 东吴证券, 闻泰科技, 立讯精密","宁波华翔, 比亚迪, 长城汽车, 长鹰信质, 格力电器, 长城汽车, 新晨科技, 硕贝德, ..."
187294,1037009,吉电股份10月13日在交易所互动平台中披露，截至10月10日公司股东户数为171303户，较...,吉电股份,0,3764,,"东旭蓝天, 智慧能源, 金风科技, 京能电力, 江苏索普, 智慧能源, 智慧能源"
187295,1037025,10月12日晚间，三星医疗发布2023年前三季度业绩预告，公司预计前三季度实现归属于母公司所...,三星医疗,1,,,
187296,1037030,每经AI快讯，有投资者在投资者互动平台提问：公司领导，请问公司经营是不是出现重大问题了，股票...,亿华通,0,496,,"中国船舶, 仕佳光子, 百奥泰"




In [6]:
del df_company['Explicit_ID']
df_company
df_company.to_csv('Task2.csv', index=False)

In [7]:
df_company

Unnamed: 0,NewsID,NewsContent,Explicit_Company,label,Implicit_Positive_Company,Implicit_Negative_Company
0,1,本报记者 田雨 李京华 中国建设银行股份有限公司原董事长张恩照受贿案３日一审宣...,建设银行,0,,"比亚迪, 中国银行, 中国铁建, 上海银行, 新华联, 中国船舶, 招商证券, 来伊份, 我..."
1,2,中国农业银行信用卡中心由北京搬到上海了！ 农行行长杨明生日前在信用卡中心揭牌仪式上...,农业银行,1,"京东方, 中国银行, 金风科技, 邮储银行, 中国船舶, 交通银行, 建设银行, 金地集团,...",
2,3,在新基金快速发行以及申购资金回流的情况下，市场总体上呈现资金流动性过剩格局，考虑到现阶段...,"外运发展, 中国国航",1,"中国外运, 中国外运, 南方航空, 南方航空, 吉祥航空, 春秋航空, 工商银行, 中国交建...",
3,4,胜利股份（000407）公司子公司填海造地2800亩，以青岛的地价估算，静态价值在10亿...,胜利股份,1,特锐德,
4,8,由于全球最大的俄罗斯Uralkaly钾矿被淹，产量大减，同时满洲里口岸铁路在修复线，导致...,冠农股份,1,"藏格控股, 富邦股份",
...,...,...,...,...,...,...
187293,1037007,10月13日，今日共有43只涨停股，5只跌停股。其中，涨停股主要集中在华为概念股、减肥药概念...,"模塑科技, 龙版传媒, 莎普爱思, 光洋股份, 通化金马, 圣龙股份, 通宇通讯, 欧菲光",0,"江苏银行, 京东方, 东吴证券, 闻泰科技, 立讯精密","宁波华翔, 比亚迪, 长城汽车, 长鹰信质, 格力电器, 长城汽车, 新晨科技, 硕贝德, ..."
187294,1037009,吉电股份10月13日在交易所互动平台中披露，截至10月10日公司股东户数为171303户，较...,吉电股份,0,,"东旭蓝天, 智慧能源, 金风科技, 京能电力, 江苏索普, 智慧能源, 智慧能源"
187295,1037025,10月12日晚间，三星医疗发布2023年前三季度业绩预告，公司预计前三季度实现归属于母公司所...,三星医疗,1,,
187296,1037030,每经AI快讯，有投资者在投资者互动平台提问：公司领导，请问公司经营是不是出现重大问题了，股票...,亿华通,0,,"中国船舶, 仕佳光子, 百奥泰"
