Open datasets from Machine Learning Project
- Description: Excel sheet of the names of 17,099 products that were web scraped and anonymized from three online shops. The products were classified to European Classification of Individual Consumption according to Purpose (ECOICOP) categories manually by non-experts. Original dataset was in Polish and was translated to different languages provided in different sheets. The translation was carried out using an on-line translator and was not reviewed for accuracy or appropriateness.
- Source: Statistics Poland
- Preview - Sheet "Polish" (header and first three rows)
produkt | kategoria |
Hejki - Emotki lizaki ręcznie robione o smakach owocowych | Wyroby cukiernicze |
100% Pur jus d orange sok pomarańczowy z miąższe... | Soki owocowe i warzywne |
100% sukraloza bez cukru (substancje słodzące) | Sztuczne substytuty cukru |
- Preview - Sheet "English" (header and first three rows)
produkt | kategoria |
Hejki - Emotes handmade lollipops with fruit flavors | Confectionery products |
100% Pur jus d orange orange juice with pulp ... | Fruit and vegetable juices |
100% sucralose without sugar (sweeteners) | Artificial sugar substitutes |
- Example of reading the dataset in python
df = pd.read_excel('https://raw.githubusercontent.com/UNECE/ML_dataset/master/Stats%20Poland%20ECOICOP%20data.xlsx', sheet_name = 'Polish')
You can choose data in: Dutch, English, French, German, Italian, Polish or Spanish, by changing the value of the parameter sheet_name.
- Description: Excel sheet with quarterly electricity supplied (e.g. combustible, hyrdo, nuclear), economic indicator (e.g. GDP, GVA) and other variables (e.g. population, sunspots) from 2000 Q1 to 2019 Q1.
- Source: Buelens, Bart, & Goyens, Anneleen. (2020). Energy Balance Flanders quarterly and monthly data and related auxiliary data (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3596695
- Preview (header and first three rows)
Variable | Full name | 2000Q1 | 2000Q2 | 2000Q3 | ... | 2019Q1 |
EnrgCombustibleFuels | + Combustible Fuels GWh | 10108 | 7739 | 6984 | ... | 8681.063 |
EnrgNuclearNuclear | + Nuclear GWh | 11087 | 10770 | 11154 | ... | 9304.936 |
EnrgHydroHydro | + Hydro GWh | 446 | 357 | 397 | ... | 346.173 |
- Example of reading the dataset in python
df = pd.read_excel('https://zenodo.org/record/3596695/files/VITO_EnergyBalanceDataML.xlsx', sheet_name = 'quarterly_txt')