# 1.Education & GDP


## 1.1. Education Spending: How is Education Financed, How Much Do We Spend on It, and What Are the Returns?

**Authors**: Max Roser and Esteban Ortiz-Ospina

**Date**: June 23, 2016

**url**: [Education Spendingd](https://ourworldindata.org/financing-education)


### Preview of article

In [2]:
import requests
from bs4 import BeautifulSoup

url = 'https://ourworldindata.org/financing-education'

response = requests.get(url)
response.encoding = response.apparent_encoding

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')

    article_content = []

    if soup.find('h1'):
        article_content.append(soup.find('h1').get_text())

    for header in soup.find_all(['h2', 'h3', 'h4', 'h5', 'h6']):
        article_content.append(header.get_text())

    for paragraph in soup.find_all('p'):
        article_content.append(paragraph.get_text())

    for content in article_content:
        print(content)
else:
    print('ERROR:', response.status_code)

Education Spending
How is education financed? How much do we spend on it? What are the returns?
When did the provision of education first become a public policy priority?
How did the US finance the expansion of public education?
How did France finance the expansion of public education?
In the US growth in education expenditure was characterized by growth specifically in the public sector
When did the expansion of basic education become a global phenomenon?
Education inequality is falling around the world
Education inequality can decline rapidly across all levels of education – South Korea is an example
Is funding for education expanding?
Is additional funding for education taking resources from other sectors?
European countries tend to assign a lower share of public budgets to education, relative to the amount of their income that is devoted to education
In European countries the weight of primary education within total education spending is lower than in other countries
In high-income

### Chart in the Article

![1.1.1.png](attachment:6fa3822e-d104-4101-b429-8aa90cf6666f.png)

### OCR Data Process Method A: WebPlotDigitizer


In [3]:
import pandas as pd
### We extract data on education expenditure from the charts in the article by Max Roser and Esteban Ortiz-Ospina using OCR methods
Expenditure_on_Education_raw = pd.read_csv('Expenditure_on_Education_raw.csv')
Expenditure_on_Education_raw.head()

Unnamed: 0,1970.989032369146,0.013901108595352013
0,1971.129804,0.014041
1,1971.261777,0.014386
2,1971.39375,0.014645
3,1971.516925,0.014909
4,1971.6401,0.015176


In [4]:
### Round the first column to obtain the years as integers
Expenditure_on_Education_raw.columns = ['Year','Expenditure on Education']
Expenditure_on_Education_raw['Year'] = Expenditure_on_Education_raw['Year'].apply(int)
Expenditure_on_Education_raw.head()

Unnamed: 0,Year,Expenditure on Education
0,1971,0.014041
1,1971,0.014386
2,1971,0.014645
3,1971,0.014909
4,1971,0.015176


In [6]:
### We average the data for each duplicate year since more than one data for the same year are extracted and we only need one data each year
result1_1 = Expenditure_on_Education_raw.groupby('Year')['Expenditure on Education'].mean().reset_index()
result1_1.head()

Unnamed: 0,Year,Expenditure on Education
0,1971,0.015041
1,1972,0.016205
2,1973,0.016971
3,1974,0.017545
4,1975,0.017408


# 2. Work Hour & GDP

## 2.1 Do workers in richer countries work longer hours?

**Authors**: Charlie Giattino and Esteban Ortiz-Ospina

**Date**: December 21, 2021

**url**: [rich-poor-working-hours](https://ourworldindata.org/rich-poor-working-hours)


### Preview of article

In [6]:
url = 'https://ourworldindata.org/rich-poor-working-hours'

response = requests.get(url)
response.encoding = response.apparent_encoding

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')

    article_content = []

    if soup.find('h1'):
        article_content.append(soup.find('h1').get_text())

    for header in soup.find_all(['h2', 'h3', 'h4', 'h5', 'h6']):
        article_content.append(header.get_text())

    for paragraph in soup.find_all('p'):
        article_content.append(paragraph.get_text())

    for content in article_content:
        print(content)
else:
    print('ERROR:', response.status_code)

Do workers in richer countries work longer hours?
Workers in richer countries tend to work fewer hours than those in poorer countries.
Summary
How much time do people spend working?
How do people spend their time?
Endnotes
Cite this work
Reuse this work freely
Workers in richer countries tend to work fewer hours than those in poorer countries. This is because in richer countries workers are able to produce more with each hour of work, which translates into higher incomes and the ability to work less.
The large differences in working hours across countries have important implications for the way we think about the economic progress made in the last two centuries and the nature of inequality between countries today.
Economic prosperity in different places across our world today is vastly unequal. People in Switzerland, one of the richest countries in the world, have an average income that is more than 20-times higher than that of people in Cambodia.1 Life in these two countries can look 

### Table in the Artical

![2.1.1.png](attachment:f499cc7c-4f8e-464f-92fd-d95166019d30.png)
![2.1.2.png](attachment:3a042aa0-9aff-496d-8419-a5c0567a8bb7.png)

### MethodB: PaddleOCR

To run the following code, please first read and configure your environment as per the instructions found here: [PaddleOCR ppstructure README](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md).

If your computer has an NVIDIA® GPU, ensure the following conditions are met and install the GPU version of PaddlePaddle:

- **CUDA Toolkit 11.2** in conjunction with **cuDNN v8.2.1**. If you plan to use PaddleTensorRT for inference, you'll also need **TensorRT 8.2.4.2**.
- Install CUDA11.2 compatible PaddlePaddle using the command:

    ```bash
    python -m pip install paddlepaddle-gpu==2.6.1.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
    ```

- To verify the installation, import paddle and then execute:

    ```python
    import paddle
    paddle.utils.run_check()
    ```

    If you see "PaddlePaddle is installed successfully!", it means you have successfully installed PaddlePaddle.

- To fully clone the PaddleOCR source code (for both prediction and training), use:

    ```bash
    git clone https://github.com/PaddlePaddle/PaddleOCR
    ```


In [7]:
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

In [8]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'
img_path_1 = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/2.1.1.png'
img_1 = cv2.imread(img_path_1)

[2024/04/05 12:25:53] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

In [9]:
result_1 = table_engine(img_1)
save_structure_res(result_1, save_folder, os.path.basename(img_path_1).split('.')[0])

[2024/04/05 12:25:57] ppocr DEBUG: dt_boxes num : 156, elapse : 0.44876551628112793
[2024/04/05 12:26:20] ppocr DEBUG: rec_res num  : 156, elapse : 22.36610198020935


In [10]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'

img_path_2 = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/2.1.2.png'
img_2 = cv2.imread(img_path_2)
result_2 = table_engine(img_2)
save_structure_res(result_2, save_folder, os.path.basename(img_path_2).split('.')[0])

[2024/04/05 12:26:20] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

Combain two data together

In [11]:
file_path1 = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/2/[4, 11, 2829, 1263]_0.xlsx'
file_path2 = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/2/[4, 11, 2829, 1262]_0.xlsx'
output_path = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/2/combined.xlsx'

df1 = pd.read_excel(file_path1)
df2 = pd.read_excel(file_path2)

result2_1 = pd.concat([df1, df2], ignore_index=True)
result2_1.head()

Unnamed: 0,Entity,Code,Year,Annual Working Hours,GDP per Capita,Population
0,China,CHN,1970,1976.312,1292.507,822534464.0
1,China,CHN,1971,1975.7937,1382.5481,843285440.0
2,China,CHN,1972,1975.5077,1310.6804,862840384.0
3,China,CHN,1973,1975.3647,1396.8113,881652096.0
4,China,CHN,1974,1975.1897,1392.1698,899367680.0


# 3. Consumer Price Index

## 3.1 How are incomes adjusted for inflation?

**Authors**: Joe Hasell and Max Roser

**Date**: July 11, 2023

**url**: [Consumer Price Index](https://ourworldindata.org/how-are-incomes-adjusted-for-inflation)


In [6]:
#### Method A: WebPlotDigitizer
Consumer_Price_Index_raw = pd.read_csv('Consumer Price Index_raw.csv')
Consumer_Price_Index_raw.columns = ['Year','Consumer Price Index']
Consumer_Price_Index_raw['Year'] = Consumer_Price_Index_raw['Year'].apply(int)
result3_1 = Consumer_Price_Index_raw.groupby('Year')['Consumer Price Index'].mean().reset_index()
result3_1

Unnamed: 0,Year,Consumer Price Index
0,1986,27.135625
1,1987,30.278093
2,1988,35.876417
3,1989,39.59041
4,1990,40.856244
5,1991,42.842797
6,1992,47.239339
7,1993,56.963664
8,1994,66.84384
9,1995,76.872817


# 4. Government spending as a share of GDP

## 4.1. What do governments spend their financial resources on?

**Authors**: Max Roser and Esteban Ortiz-Ospina

**Date**: March 2023

**url**: [Government Spending](https://ourworldindata.org/government-spending) 

In [7]:
#### Method A: WebPlotDigitizer
Government_Expenditure_raw = pd.read_csv('Government Expenditure_raw.csv')
Government_Expenditure_raw.columns = ['Year','Government Expenditure']
Government_Expenditure_raw['Year'] = Government_Expenditure_raw['Year'].apply(int)
result4_1 = Government_Expenditure_raw.groupby('Year')['Government Expenditure'].mean().reset_index()
result4_1

Unnamed: 0,Year,Government Expenditure
0,1982,0.278316
1,1983,0.269566
2,1984,0.257825
3,1985,0.253251
4,1986,0.239202
5,1987,0.217298
6,1988,0.211222
7,1989,0.211321
8,1990,0.198848
9,1991,0.180493


# 5. Foreign direct investment as share of GDP

## 5.1. Revitalize the global partnership for sustainable development

**Authors**: Our World in Data team

**Date**: July 18, 2023

**url**: [Foreign direct investment](https://ourworldindata.org/sdgs/global-partnerships)

In [8]:
#### Method A: WebPlotDigitizer
Foreign_Direct_Investment_raw = pd.read_csv('Foreign Direct Investment as a share of GDP_raw.csv')
Foreign_Direct_Investment_raw.columns = ['Year','Foreign Direct Investment']
Foreign_Direct_Investment_raw['Year'] = Foreign_Direct_Investment_raw['Year'].apply(int)
result5_1 = Foreign_Direct_Investment_raw.groupby('Year')['Foreign Direct Investment'].mean().reset_index()
result5_1

Unnamed: 0,Year,Foreign Direct Investment
0,1981,0.000196
1,1982,0.000299
2,1983,0.000504
3,1984,0.001547
4,1985,0.001653
5,1986,0.00186
6,1987,0.002513
7,1988,0.002493
8,1989,0.002279
9,1990,0.002382


# 6. Unemployment rate

## 6.1. Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all

**Authors**: Our World in Data team

**Date**: July 18, 2023

**url**: [Unemployment Rate](https://ourworldindata.org/sdgs/economic-growth) 

In [10]:
#### Method A: WebPlotDigitizer
Unemployment_rate_raw = pd.read_csv('Unemployed rate_raw.csv')
Unemployment_rate_raw.columns = ['Year','Unemployment rate']
Unemployment_rate_raw['Year'] = Unemployment_rate_raw['Year'].apply(int)
result6_1 = Unemployment_rate_raw.groupby('Year')['Unemployment rate'].mean().reset_index()
result6_1

Unnamed: 0,Year,Unemployment rate
0,1991,0.023748
1,1992,0.025705
2,1993,0.028493
3,1994,0.029797
4,1995,0.030949
5,1996,0.032059
6,1997,0.032393
7,1998,0.032503
8,1999,0.032752
9,2000,0.036668


# 7. Health Expenditure

## 7.1. How is healthcare financed? How much do we spend on it? What are the returns?

**Authors**: Max Roser and Esteban Ortiz-Ospina

**Date**: June 16, 2017

**url**: [Health Expenditure](https://ourworldindata.org/financing-healthcare)

In [11]:
#### Method A: WebPlotDigitizer
Government_health_expenditure_raw = pd.read_csv('Government health expenditure as a share of GDP_raw.csv')
Government_health_expenditure_raw.columns = ['Year','Government health expenditure']
Government_health_expenditure_raw['Year'] = Government_health_expenditure_raw['Year'].apply(int)
result7_1 = Government_health_expenditure_raw.groupby('Year')['Government health expenditure'].mean().reset_index()
result7_1

Unnamed: 0,Year,Government health expenditure
0,2000,0.009749
1,2001,0.010493
2,2002,0.011872
3,2003,0.012719
4,2004,0.013313
5,2005,0.013762
6,2006,0.014078
7,2007,0.015707
8,2008,0.019483
9,2009,0.021559


# 8. Life Expectancy

## 8.1. How is healthcare financed? How much do we spend on it? What are the returns?

**Authors**: Max Roser and Esteban Ortiz-Ospina

**Date**: June 16, 2017

**url**: [Life Expectancy](https://ourworldindata.org/financing-healthcare)

In [12]:
#### Method A: WebPlotDigitizer
Life_Expectancy_raw = pd.read_csv('Life Expectancy_raw.csv')
Life_Expectancy_raw.columns = ['Year','Life Expectancy']
Life_Expectancy_raw['Year'] = Life_Expectancy_raw['Year'].apply(int)
result8_1 = Life_Expectancy_raw.groupby('Year')['Life Expectancy'].mean().reset_index()
result8_1

Unnamed: 0,Year,Life Expectancy
0,1971,57.47211
1,1972,58.3886
2,1973,59.367887
3,1974,60.144828
4,1975,60.702793
5,1976,61.613119
6,1977,62.365282
7,1978,62.818361
8,1979,63.483663
9,1980,64.065235


# 9. Government Revenues

## 9.1. Revitalize the global partnership for sustainable development

**Authors**: Our World in Data team

**Date**: July 18, 2023

**url**: [Government Revenues](https://ourworldindata.org/sdgs/global-partnerships)

In [13]:
#### Method A: WebPlotDigitizer
Government_revenues_raw = pd.read_csv('Government revenues as a share of GDP_raw.csv')
Government_revenues_raw.columns = ['Year','Government revenues']
Government_revenues_raw['Year'] = Government_revenues_raw['Year'].apply(int)
result9_1 = Government_revenues_raw.groupby('Year')['Government revenues'].mean().reset_index()
result9_1

Unnamed: 0,Year,Government revenues
0,2000,0.081509
1,2001,0.087565
2,2002,0.091519
3,2003,0.094892
4,2004,0.169057
5,2005,0.253406
6,2006,0.296376
7,2007,0.310111
8,2008,0.266759
9,2009,0.270304


# 10. Age Dependency Ratio

## 10.1. What is the age profile of populations around the world? How did it change and what will the age structure of populations look like in the future?

**Authors**: Hannah Ritchie and Max Roser

**Date**: September 2019

**url**: [Age Dependency Ratio](https://ourworldindata.org/age-structure)

In [14]:
#### Method A: WebPlotDigitizer
Age_dependency_ratio_raw = pd.read_csv('Age dependency ratio_raw.csv')
Age_dependency_ratio_raw.columns = ['Year','Age dependency ratio']
Age_dependency_ratio_raw['Year'] = Age_dependency_ratio_raw['Year'].apply(int)
result10_1 = Age_dependency_ratio_raw.groupby('Year')['Age dependency ratio'].mean().reset_index()
result10_1

Unnamed: 0,Year,Age dependency ratio
0,1970,0.801329
1,1971,0.798023
2,1972,0.789539
3,1973,0.786612
4,1974,0.789361
5,1975,0.792524
6,1976,0.79237
7,1977,0.770575
8,1978,0.731298
9,1979,0.702196


# 11. Research & Development Spending as a Share of GDP

## 11.1. Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation

**Authors**: Our World in Data team

**Date**: July 18, 2023

**url**: [Research & Development Spending as a Share of GDP](https://ourworldindata.org/sdgs/industry-innovation-infrastructure)

In [15]:
#### Method A: WebPlotDigitizer
Research_development_spending_raw = pd.read_csv('Research and development spending as a share of GDP_raw.csv')
Research_development_spending_raw.columns = ['Year','Research and development spending']
Research_development_spending_raw['Year'] = Research_development_spending_raw['Year'].apply(int)
result11_1 = Research_development_spending_raw.groupby('Year')['Research and development spending'].mean().reset_index()
result11_1

Unnamed: 0,Year,Research and development spending
0,1996,0.006249
1,1997,0.006535
2,1998,0.007251
3,1999,0.008491
4,2000,0.009267
5,2001,0.010155
6,2002,0.010956
7,2003,0.011746
8,2004,0.012646
9,2005,0.01336


# 12. Taxes on income vs. taxes on goods and services & GDP

## 12.1 Who is paying how much and how do tax systems differ??

**Authors**: Esteban Ortiz-Ospina and Max Roser

**Date**: September 2016 

**url**: [Taxation](https://ourworldindata.org/taxation)


In [19]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'

img_path_2 = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/12.1.png'
img_2 = cv2.imread(img_path_2)
result_2 = table_engine(img_2)
save_structure_res(result_2, save_folder, os.path.basename(img_path_2).split('.')[0])

[2024/04/05 12:43:16] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

In [21]:
result_12 = pd.read_excel('C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/12/[0, 0, 887, 851]_0.xlsx')
result_12.columns = [
    'Entity',
    'Code',
    'Year',
    'Taxes on goods and services (as a share of GDP)',
    'Taxes on income, profits and capital gains (TIPCG) (as a share of GDP)',
    'Population'
]
result_12.head()

Unnamed: 0,Entity,Code,Year,Taxes on goods and services (as a share of GDP),"Taxes on income, profits and capital gains (TIPCG) (as a share of GDP)",Population
0,China,CHN,1989,9. 170754,4. 6433463,1134414720
1,China,CHN,1990,8.645571,4. 4651213,1153704192
2,China,CHN,1991,7.9783025,3.277696,1170626176
3,China,CHN,1992,7.6122904,2.988174,1183813376
4,China,CHN,1993,8.084106,2. 1756253,1195855616


# 13. The Human Development Index and related indices

## 13.1 Human Development Index (HDI)

**Authors**:  Bastian Herre and Pablo Arriagada

**Date**: November 01, 2023

**url**: [Human Development Index](https://ourworldindata.org/human-development-index)

**The Human Development Index (HDI)** is a summary measure of key dimensions of human development: a long and healthy
life, a good education, and a decent standard of living. Higher values indicate higher human development.

In [27]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'

img_path = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/13.1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

[2024/04/05 13:10:39] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

In [28]:
result_13_1 = pd.read_excel('C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/13/[0, 0, 861, 793]_0.xlsx')
result_13_1.head()

Unnamed: 0,Entity,Code,Year,Human Development Index,GDP per capita,Population
0,China,CHN,1990,0. 484,1423.8964,1153704192
1,China,CHN,1991,0. 492,1534.7053,1170626176
2,China,CHN,1992,0. 504,1731.6572,1183813376
3,China,CHN,1993,0.515,1949.5343,1195855616
4,China,CHN,1994,0.525,2178.924,1207286656


## 13.2 Gender Development Index (GDI)
**The Gender Development Index (GDI)** measures gender inequalities in the achievement of key dimensions of human
development: a long and healthy life, a good education, and a decent standard of living. Values close to 1 indicate higher
gender equalit



In [29]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'

img_path = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/13.2.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

[2024/04/05 13:12:42] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

In [31]:
result_13_2 = pd.read_excel('C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/13/[0, 0, 875, 793]_0.xlsx')
result_13_2.head()


Unnamed: 0,Entity,Code,Year,Gender Development Index,Human Development Index,Population
0,China,CHN,1990,0. 873,0. 484,1153704192
1,China,CHN,1991,0.875,0. 492,1170626176
2,China,CHN,1992,0.881,0.504,1183813376
3,China,CHN,1993,0.888,0.515,1195855616
4,China,CHN,1994,0.891,0.525,1207286656


# 14. Productivity & GDP per capita


**url**: [Productivity vs. GDP](https://ourworldindata.org/grapher/labor-productivity-vs-gdp-per-capita)

**Productivity** is measured as gross domestic product (GDP) per hour of work. This data is adjusted for inflation and 
differences in the cost of living between countries.
**GDP per capita** is 
 (output, multiple price benchmaks)

In [35]:
table_engine = PPStructure(show_log=True)
save_folder = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)'

img_path = 'C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/14.1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

[2024/04/05 13:23:02] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\det\\ch\\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\qyh18/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_len

In [36]:
result_14 = pd.read_excel('C:/Users/qyh18/02 NUS Semester2/DE-on-GDP-and-Economy/Article Data Fetching(OCR+SVG)/14/[0, 0, 708, 1225]_0.xlsx')
result_14.head()

Unnamed: 0,Entity,Code,Year,Productivity,GDP per capita,Population
0,China,CHN,1970,1. 4994909,1292.507,822534464
1,China,CHN,1971,1.5798508,1382.5481,843285440
2,China,CHN,1972,1. 492279,1310.6804,862840384
3,China,CHN,1973,1. 5905421,1396.8113,881652096
4,China,CHN,1974,1. 5722494,1392.1698,899367680
