Avenga Internship task

Task:

Here is a dataset: "https://www.kaggle.com/datasets/anandaramg/taxi-trip-data-nyc?resource=download". Using Apache Spark (Scala or Python), get the following report: |-- Vendor: string (nullable = true) - name of vendor |-- Payment Type: string (nullable = true) - name of payment type |-- Payment Rate: double (nullable = true) - payment_total / passengers count per vendor and payment type |-- Next Payment Rate: double (nullable = true) - next record(bigger) payment rate for vendor |-- Max Payment Rate: double (nullable = true) - max payment rate for vendor |-- Percents to next rate: string (nullable = true) - how many points (in percents) is necessary to achieve the next record payment rate

Explanation:

Rename data in columns VendorID and payment_type.
Creating a table with payment_rate per vendor and payment type, excluding unknown vendor and not respective data as null or month that isn't July.
Creating a table with avg payment_rate per day for vendor.
Calculate growth rate of payment_rate per day: growth rate = = (payment_rate today - payment_rate yesterday)/payment_rate yesterday.
Calculate avg growth rate per day and multiply it by 30 days in month to find month grow in percent (percents_to_next_rate column).
Creating a table with avg payment_rate per vendor per month and max_payment_rate.
Adding and editing data to the resulting table.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
final		final
Avenga.ipynb		Avenga.ipynb
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
emailing.py		emailing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avenga Internship task

About

Releases

Packages

Languages

OleksandrZhydyk/Avenga

Folders and files

Latest commit

History

Repository files navigation

Avenga Internship task

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages