SparkSQL with Python

This repository has some examples of using Spark and SparkSQL with Python through PySpark

Profeco

We will work with the Profeco dataset, which you can download here: Profeco , is a daily historical record of more than 2,000 products, as of 2015, in various establishments in Mexico

Check the code here

How many records are there?
How many categories are there?
How many trade chains are being monitored (and therefore reported in that database)?
What are the most monitored products in each state of the country?
What is the trade chain with the greatest variety of monitored products?

Countries airports

Check the code here

API to count the number of tweets in a radius of 1km

I will separate in another file "tweets_geo.csv" all the different tweets with their geographic data information, this will help in the manipulation of this data in a query with sparkSQL

Check the data preparation code here

The details of the code for the API REST is in the folder API in this repository

Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

Authors

Created by Ramses Alexander Coraspe Valdez
Created on 2020

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
code		code
data		data
docs		docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparkSQL with Python

Profeco

Countries airports

API to count the number of tweets in a radius of 1km

Contributing and Feedback

Authors

License

About

Uh oh!

Releases

Packages

Languages

Wittline/SparkSQL-with-Python

Folders and files

Latest commit

History

Repository files navigation

SparkSQL with Python

Profeco

Countries airports

API to count the number of tweets in a radius of 1km

Contributing and Feedback

Authors

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages