Skip to content

Latest commit

 

History

History
114 lines (98 loc) · 14.1 KB

README.md

File metadata and controls

114 lines (98 loc) · 14.1 KB

This repository offers a wide range of datasets and queries from open data or our own practices (with necessary desensitization).

Datasets include a large number of typical domains, with diversified data characters (e.g., different column/tuple numbers).

Queries are real SQL statements that support various functionalities, such as feature extraction (), transactions (), and analytical queries (coming soon).

energy

name description table number column number SQL source
GEF2012-wind-forecasting Hourly power generation at 7 wind farms 10 61 kaggle
electric-power-consumption Per capita energy consumption in Morocco 1 9 kaggle
energydata_complete 2 59
ashrae-energy-prediction Energy usage from over 1,000 buildings over a three-year timeframe 5 32 kaggle

finance

name description table number column number SQL source
recruit-restaurant-visitor-forecasting The browsing statistics of two restaurant websites 8 28 kaggle
santander-customer-satisfaction Hundreds of anonymized features that could reflect whether a customer is satisfied with their banking experience 1 372 kaggle
GiveMeSomeCredit Credit features of 250,000 borrowers in banking scenario 1 13 kaggle
daily-financial-news Daily financial news for over 6,000 stocks 2 12 tianchi
restaurant-revenue-prediction Demographic, real estate, and commercial data for the investments of new restaurant sites 2 85 kaggle
homesite-quote-conversion An anonymized database of information on customer and sales activity 2 597 kaggle
allstate-claims-severity Insurance claims for worry-free customer experiences 3 265 kaggle
tiantian The price-related features constructed using the fund market data downloaded from TianTian Fund website 1 332 tianchi
sberbank-russian-housing-market Information about overall conditions in the country's economy and finance sector 4 685 kaggle
dow_jones_index 1 16
robinhood-stock-data The historical stock price of Robinhood (ticker symbol HOOD) 1 6 kaggle
porto-seguro-safe-driver-prediction The features that affect an auto insurance policy holder files a claim 1 60 kaggle
amex-default-prediction 4 384
house-rent-prediction-dataset Information on almost 4700+ Houses/Apartments/Flats Available for Rent 1 12 kaggle

health

name description table number column number SQL source
big-data-derby-2022 A wealth of data is now collected, including measures for heart rate, EKG, longitudinal movement, et al 3 24 kaggle
predict-west-nile-virus Weather, location, testing, and spraying data 5 51 kaggle
covid19-global-forecasting-week-2 Statistics of COVID19 cases in various locations across the world 1 6 kaggle
covid19-global-forecasting-week-5 Statistics of COVID19 cases in various locations across the world 1 9 kaggle
covid19-global-forecasting-week-4 Statistics of COVID19 cases in various locations across the world 1 6 kaggle
covid19-global-forecasting-week-1 Statistics of COVID19 cases in various locations across the world 1 8 kaggle
covid19-global-forecasting-week-3 Statistics of COVID19 cases in various locations across the world 1 6 kaggle

media

name description table number column number SQL source
facebook-v-predicting-check-ins 3 13
telstra-recruiting-network 7 18
twitter-threads Thread functionality in Twitter 5 35 tianchi
spotify-app-reviews-2022 Spotify reviews on Google Play Store 1 6 kaggle

meteorology

name description table number column number SQL source
PRSA2017_Data_20130301-20170228 12 216
AirQualityUCI The responses of a gas multisensor device deployed on the field in an Italian city 1 1 UCI_ML
historicalweatherdataforindiancities Temperature data (Minimum, Average, Maximum) in degrees Centigrade and Precipitation data 7 34 kaggle

retails

name description table number column number SQL source
store-sales-time-series-forecasting Dates, store and product information 5 22 kaggle
coupon-purchase-prediction A year of transactional data for 22,873 users on the site ponpare.jp 9 80 kaggle
grupo-bimbo-inventory-demand 9 weeks of sales transactions in Mexico 6 28 kaggle
rossmann-store-sales Historical sales data for 1,115 Rossmann stores 2 19 kaggle
favorita-grocery-sales-forecasting Dates, store and item information, whether that item was being promoted, as well as the unit sales 6 26 kaggle
walmart-recruiting-store-sales-forecasting 5 26
walmart-recruiting-sales-in-stormy-weather Sales data for 111 products whose sales may be affected by the weather (such as milk, bread, umbrellas, etc.) 4 28 kaggle
ecommerce-customerssales-record Order Statistics 1 41 kaggle
competitive-data-science-predict-future-sales Daily historical sales data. 5 16 kaggle
m5-forecasting-accuracy Item sales at stores in various locations for two 28-day time periods 3 1965 kaggle
goods Public production introduction information 41 807
material Historical inventory statistics 79 1265
orders Historical order details 35 809
shopmall Comments and shelf status of goods 35 809
transaction Order details (query only) 50 1069

transport

name description table number column number SQL source
pkdd-15-taxi-trip-time-prediction-ii 4 24 kaggle
nyc-taxi-trip-duration NYC Yellow Cab trip record data 3 22 kaggle
taxi-trajectory A complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto 1 9 tianchi
pkdd-15-predict-taxi-service-trajectory-i 4 25 kaggle

others

name description table number column number SQL source
talkingdata-mobile-user-demographics 8 34 kaggle
sf-crime incidents derived from SFPD Crime Incident Reporting system 3 57 tianchi
detecting-insults-in-social-commentary Detect social spam, account hacking, bot attacks, and more. 1 5 kaggle
expedia-hotel-recommendations Customer behavior 2 174 kaggle
nfl-big-data-bowl-2022 7 113
airbnb-recruiting-new-user-bookings Users along with their demographics, web session records, and some summary statistics 6 51 kaggle
unimelb Information on the investigators who are applying for the grant 1 251 kaggle
Ipin2016Dataset 8 314
dspp1 4 19
lish-moa 4 1488
foursquare-location-matching 2 38
bike-sharing-demand The duration of travel, departure location, arrival location, and time elapsed 1 12 kaggle
web-traffic-time-series-forecasting 6 1363
web-traffic-time-series-forecasting-1 2 553
korean-baseball-pitching-data-1982-2021 Team pitching data from every season of KBO Baseball 1 34 kaggle
RSSI_dataset RSSIs obtained on smartphones 2 12 UCI_ML
DontGetKicked Car information 2 67 kaggle
cyclistic-bike-share-user-dataset-1-year Cyclistic bikes 1 18 kaggle
data-science-job-salaries 1 12
Hybrid_Indoor_Positioning 1 67 UCI_ML