Yandex.Cloud S3 HDFS PySpark CLI Shell Hadoop Cluster Administration
| Number of the week | Payment type | Average check | The ratio of tips to the cost of the trip |
|---|---|---|---|
| 8 | Cash | 8.35 | 0.113323345634572907 |
Data schema:
erDiagram
TRIP |o--|| VENDOR: has
TRIP |o--|| RATES: has
TRIP |o--|| PAYMENT: has
1,2020-04-01 00:41:22,2020-04-01 01:01:53,1,1.20,1,N,41,24,2,5.5,0.5,0.5,0,0,0.3,6.8,0 Learn more about the data source here
- Deploying a Hadoop cluster using a
Yandex.Cloudsolution : - Creating a bucket using a
S3Yandex.Cloud solution. - Downloading data (database) to created
s3bucket usingdistcp. - Initiating Spark job