# **Arxiv metadata Analytics with PySpark RDD: JSON case study**

### Udemy Course: Best Hands-on Big Data Practices and Use Cases using PySpark

### Author: Amin Karami (PhD, FHEA)
#### email: amin.karami@ymail.com

In [1]:
########## ONLY in Colab ##########
!pip3 install pyspark
########## ONLY in Colab ##########



In [2]:
# Initializing Spark
from pyspark import SparkContext, SparkConf

conf1 = SparkConf().setAppName("Archive_Pyspark").setMaster("local[*]")
sc = SparkContext(conf=conf1)

print(sc)

print("ready to go")

23/03/19 20:33:02 WARN Utils: Your hostname, Adrian-Laptop.local resolves to a loopback address: 127.0.0.1; using 192.168.100.19 instead (on interface en0)
23/03/19 20:33:02 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


23/03/19 20:33:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/03/19 20:33:04 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
<SparkContext master=local[*] appName=Archive_Pyspark>
ready to go


In [3]:
# Read and Load Data to Spark
# Data source: https://www.kaggle.com/Cornell-University/arxiv/version/62
import json

rdd_json = sc.textFile("data.json", 100)
rdd = rdd_json.map(lambda x: json.loads(x))

rdd.persist()



PythonRDD[2] at RDD at PythonRDD.scala:53

In [None]:
# Check the number of parallelism and partitions:
print(sc.defaultParallelism)
print(rdd.getNumPartitions())

In [None]:
import findspark
findspark.init()



## Question 1: Count elements

In [None]:
rdd.count()

## Question 2: Get the first two records


In [None]:
rdd.take(2)

## Question 3: Get all attributes


In [None]:
rdd.flatMap(lambda x: x.key()).distinct().collect()

## Question 4: Get the name of the licenses

In [None]:
rdd.map(lambda x: x["license"]).distinct().collect()

## Question 5: Get the shortest and the longest titles

In [None]:
shortest_title_rdd= rdd.map(lambda x: x['title']).reduce(lambda x,y: x if x < y else y )
longest_title_rdd= rdd.map(lambda x: x['title']).reduce(lambda x,y: x if x > y else y )

print( "The longest: ", longest_title_rdd)
print( "The shortest: ", shortest_title_rdd)

## Question 6: Find abbreviations with 5 or more letters in the abstract

In [7]:
import re

def get_abbrivations(line):
    result = re.search(r"\(([A-Za-z][^_/\\<>*])\)", line)
    if result:
        return result.group(1)      

In [None]:
rdd.filter(lambda x: get_abbrivations(x['abstract'])).count()

In [None]:
rdd_2=rdd.filter(lambda x: get_abbrivations(x['abstract']))
rdd_2.take(3)

## Question 7: Get the number of archive records per month ('update_date' attribute)

In [None]:
import datetime

def extract_date(DateIn):
    d = datetime.datetime.strptime(DateIn, "%Y-%m-%d")
    return d.month

extract_date('2003-12-09')

## Question 8: Get the average number of pages

In [None]:
rdd.map(lambda x: (extract_date(x['update_date']), 1)).reduceByKey(lambda x,y : x+y).sortBy(lambda l: l[1]).collect()

In [8]:
def get_page(line):
    search = re.findall('\d+ pages', line)
    if search:
        return int(search[0].split(" ")[0])
    else :
        return 0
    

get_page('54 pages')

54

In [None]:
rdd_avarage = rdd.map(lambda x: get_page(x['comments'] if x['comments'] != None else "None"))

#remove 0
rdd_avarage = rdd_avarage.filter(lambda x: x != 0)

avarage_counter = rdd_avarage.count()

avarage_summation = rdd_avarage.reduce(lambda x,y : x+y)

print(avarage_counter)
print(avarage_summation)

print(" the avarage of page is: ", (avarage_summation/avarage_counter))




In [None]:
import warnings
warnings.filterwarnings('ignore')

In [10]:

def get_Day_from_version(item) :
    values = ''.join(str(v) for v in item)
    return values.split(",")[1].split(": '")[1]

def get_tuple(item):
    a,b = item
    return b[0]

In [13]:
rdd.map(lambda x: (get_Day_from_version(x['versions']), (get_page(x['comments'] if x['comments'] != None else "None"), 1)))\
    .filter(lambda x: get_tuple(x) !=0)\
    .reduceByKey(lambda x,y: (x[0]+y[0], x[1]+y[1]))\
    .map(lambda x: (x[0], x[1][0]/x[1][1]))\
    .collect()






23/03/19 20:40:58 WARN MemoryStore: Not enough space to cache rdd_2_21 in memory! (computed 15.7 MiB so far)
23/03/19 20:40:58 WARN BlockManager: Block rdd_2_21 could not be removed as it was not found on disk or in memory
23/03/19 20:40:58 WARN BlockManager: Putting block rdd_2_21 failed
23/03/19 20:40:58 WARN MemoryStore: Not enough space to cache rdd_2_22 in memory! (computed 15.5 MiB so far)
23/03/19 20:40:58 WARN MemoryStore: Not enough space to cache rdd_2_23 in memory! (computed 15.4 MiB so far)
23/03/19 20:40:58 WARN BlockManager: Block rdd_2_23 could not be removed as it was not found on disk or in memory
23/03/19 20:40:58 WARN BlockManager: Block rdd_2_22 could not be removed as it was not found on disk or in memory
23/03/19 20:40:58 WARN BlockManager: Putting block rdd_2_23 failed
23/03/19 20:40:58 WARN BlockManager: Putting block rdd_2_22 failed




23/03/19 20:41:01 WARN MemoryStore: Not enough space to cache rdd_2_30 in memory! (computed 6.9 MiB so far)
23/03/19 20:41:01 WARN MemoryStore: Not enough space to cache rdd_2_29 in memory! (computed 6.8 MiB so far)
23/03/19 20:41:01 WARN BlockManager: Block rdd_2_30 could not be removed as it was not found on disk or in memory
23/03/19 20:41:01 WARN BlockManager: Putting block rdd_2_30 failed
23/03/19 20:41:01 WARN BlockManager: Block rdd_2_29 could not be removed as it was not found on disk or in memory
23/03/19 20:41:01 WARN BlockManager: Putting block rdd_2_29 failed
23/03/19 20:41:01 WARN MemoryStore: Not enough space to cache rdd_2_31 in memory! (computed 6.9 MiB so far)
23/03/19 20:41:01 WARN BlockManager: Block rdd_2_31 could not be removed as it was not found on disk or in memory
23/03/19 20:41:01 WARN BlockManager: Putting block rdd_2_31 failed
23/03/19 20:41:01 WARN MemoryStore: Not enough space to cache rdd_2_25 in memory! (computed 10.4 MiB so far)
23/03/19 20:41:01 WARN B



23/03/19 20:41:03 WARN MemoryStore: Not enough space to cache rdd_2_35 in memory! (computed 4.3 MiB so far)
23/03/19 20:41:03 WARN BlockManager: Block rdd_2_35 could not be removed as it was not found on disk or in memory
23/03/19 20:41:03 WARN BlockManager: Putting block rdd_2_35 failed
23/03/19 20:41:03 WARN MemoryStore: Not enough space to cache rdd_2_32 in memory! (computed 10.6 MiB so far)
23/03/19 20:41:03 WARN BlockManager: Block rdd_2_32 could not be removed as it was not found on disk or in memory
23/03/19 20:41:03 WARN BlockManager: Putting block rdd_2_32 failed
23/03/19 20:41:03 WARN MemoryStore: Not enough space to cache rdd_2_38 in memory! (computed 4.3 MiB so far)
23/03/19 20:41:03 WARN BlockManager: Block rdd_2_38 could not be removed as it was not found on disk or in memory
23/03/19 20:41:03 WARN BlockManager: Putting block rdd_2_38 failed
23/03/19 20:41:04 WARN MemoryStore: Not enough space to cache rdd_2_33 in memory! (computed 7.0 MiB so far)
23/03/19 20:41:04 WARN B



23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_42 in memory! (computed 1622.7 KiB so far)
23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_44 in memory! (computed 1662.7 KiB so far)
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_42 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_42 failed
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_44 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_44 failed
23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_39 in memory! (computed 10.7 MiB so far)
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_39 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_39 failed
23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_45 in memory! (computed 1683.4 KiB so far)
23/03/19 20:41:



23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_41 in memory! (computed 4.3 MiB so far)
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_41 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_41 failed
23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_40 in memory! (computed 7.0 MiB so far)
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_40 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_40 failed
23/03/19 20:41:06 WARN MemoryStore: Not enough space to cache rdd_2_43 in memory! (computed 4.4 MiB so far)
23/03/19 20:41:06 WARN BlockManager: Block rdd_2_43 could not be removed as it was not found on disk or in memory
23/03/19 20:41:06 WARN BlockManager: Putting block rdd_2_43 failed




23/03/19 20:41:09 WARN MemoryStore: Not enough space to cache rdd_2_50 in memory! (computed 1658.8 KiB so far)
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_50 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_50 failed
23/03/19 20:41:09 WARN MemoryStore: Failed to reserve initial memory threshold of 1024.0 KiB for computing block rdd_2_53 in memory.
23/03/19 20:41:09 WARN MemoryStore: Not enough space to cache rdd_2_53 in memory! (computed 0.0 B so far)
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_53 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_53 failed
23/03/19 20:41:09 WARN MemoryStore: Not enough space to cache rdd_2_52 in memory! (computed 1644.9 KiB so far)
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_52 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_52 fai



23/03/19 20:41:09 WARN MemoryStore: Not enough space to cache rdd_2_47 in memory! (computed 7.2 MiB so far)
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_47 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_47 failed
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_51 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_48 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_51 failed
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_48 failed
23/03/19 20:41:09 WARN MemoryStore: Not enough space to cache rdd_2_46 in memory! (computed 10.8 MiB so far)
23/03/19 20:41:09 WARN BlockManager: Block rdd_2_46 could not be removed as it was not found on disk or in memory
23/03/19 20:41:09 WARN BlockManager: Putting block rdd_2_46 failed
23/03/19 20:41:09 WARN MemoryStore: Not enough space to cac



23/03/19 20:41:13 WARN MemoryStore: Not enough space to cache rdd_2_55 in memory! (computed 4.6 MiB so far)
23/03/19 20:41:13 WARN BlockManager: Block rdd_2_55 could not be removed as it was not found on disk or in memory
23/03/19 20:41:13 WARN BlockManager: Putting block rdd_2_55 failed
23/03/19 20:41:13 WARN MemoryStore: Not enough space to cache rdd_2_58 in memory! (computed 2.6 MiB so far)
23/03/19 20:41:13 WARN BlockManager: Block rdd_2_58 could not be removed as it was not found on disk or in memory
23/03/19 20:41:13 WARN BlockManager: Putting block rdd_2_58 failed
23/03/19 20:41:13 WARN MemoryStore: Not enough space to cache rdd_2_56 in memory! (computed 4.5 MiB so far)
23/03/19 20:41:13 WARN BlockManager: Block rdd_2_56 could not be removed as it was not found on disk or in memory
23/03/19 20:41:13 WARN BlockManager: Putting block rdd_2_56 failed
23/03/19 20:41:13 WARN MemoryStore: Not enough space to cache rdd_2_59 in memory! (computed 2.6 MiB so far)
23/03/19 20:41:13 WARN Bl



23/03/19 20:41:16 WARN MemoryStore: Not enough space to cache rdd_2_63 in memory! (computed 4.5 MiB so far)
23/03/19 20:41:16 WARN BlockManager: Block rdd_2_63 could not be removed as it was not found on disk or in memory
23/03/19 20:41:16 WARN BlockManager: Putting block rdd_2_63 failed
23/03/19 20:41:16 WARN MemoryStore: Not enough space to cache rdd_2_67 in memory! (computed 2.7 MiB so far)
23/03/19 20:41:16 WARN MemoryStore: Not enough space to cache rdd_2_62 in memory! (computed 4.6 MiB so far)
23/03/19 20:41:16 WARN BlockManager: Block rdd_2_62 could not be removed as it was not found on disk or in memory
23/03/19 20:41:16 WARN BlockManager: Putting block rdd_2_62 failed
23/03/19 20:41:16 WARN BlockManager: Block rdd_2_67 could not be removed as it was not found on disk or in memory
23/03/19 20:41:16 WARN BlockManager: Putting block rdd_2_67 failed
23/03/19 20:41:16 WARN MemoryStore: Not enough space to cache rdd_2_68 in memory! (computed 2.7 MiB so far)
23/03/19 20:41:16 WARN Bl



23/03/19 20:41:17 WARN MemoryStore: Not enough space to cache rdd_2_69 in memory! (computed 4.5 MiB so far)
23/03/19 20:41:17 WARN BlockManager: Block rdd_2_69 could not be removed as it was not found on disk or in memory
23/03/19 20:41:17 WARN BlockManager: Putting block rdd_2_69 failed
23/03/19 20:41:17 WARN MemoryStore: Not enough space to cache rdd_2_64 in memory! (computed 7.4 MiB so far)
23/03/19 20:41:17 WARN BlockManager: Block rdd_2_64 could not be removed as it was not found on disk or in memory
23/03/19 20:41:17 WARN BlockManager: Putting block rdd_2_64 failed
23/03/19 20:41:17 WARN MemoryStore: Not enough space to cache rdd_2_65 in memory! (computed 7.4 MiB so far)
23/03/19 20:41:17 WARN BlockManager: Block rdd_2_65 could not be removed as it was not found on disk or in memory
23/03/19 20:41:17 WARN BlockManager: Putting block rdd_2_65 failed
23/03/19 20:41:17 WARN MemoryStore: Not enough space to cache rdd_2_66 in memory! (computed 7.5 MiB so far)
23/03/19 20:41:17 WARN Bl



23/03/19 20:41:19 WARN MemoryStore: Not enough space to cache rdd_2_71 in memory! (computed 4.8 MiB so far)
23/03/19 20:41:19 WARN BlockManager: Block rdd_2_71 could not be removed as it was not found on disk or in memory
23/03/19 20:41:19 WARN BlockManager: Putting block rdd_2_71 failed
23/03/19 20:41:20 WARN MemoryStore: Not enough space to cache rdd_2_75 in memory! (computed 2.7 MiB so far)
23/03/19 20:41:20 WARN BlockManager: Block rdd_2_75 could not be removed as it was not found on disk or in memory
23/03/19 20:41:20 WARN BlockManager: Putting block rdd_2_75 failed
23/03/19 20:41:20 WARN MemoryStore: Failed to reserve initial memory threshold of 1024.0 KiB for computing block rdd_2_76 in memory.
23/03/19 20:41:20 WARN MemoryStore: Not enough space to cache rdd_2_76 in memory! (computed 0.0 B so far)
23/03/19 20:41:20 WARN BlockManager: Block rdd_2_76 could not be removed as it was not found on disk or in memory
23/03/19 20:41:20 WARN BlockManager: Putting block rdd_2_76 failed




23/03/19 20:41:20 WARN MemoryStore: Not enough space to cache rdd_2_72 in memory! (computed 4.6 MiB so far)
23/03/19 20:41:20 WARN BlockManager: Block rdd_2_72 could not be removed as it was not found on disk or in memory
23/03/19 20:41:20 WARN BlockManager: Putting block rdd_2_72 failed
23/03/19 20:41:20 WARN MemoryStore: Not enough space to cache rdd_2_73 in memory! (computed 4.6 MiB so far)
23/03/19 20:41:20 WARN BlockManager: Block rdd_2_73 could not be removed as it was not found on disk or in memory
23/03/19 20:41:20 WARN BlockManager: Putting block rdd_2_73 failed
23/03/19 20:41:20 WARN MemoryStore: Not enough space to cache rdd_2_74 in memory! (computed 7.5 MiB so far)
23/03/19 20:41:20 WARN BlockManager: Block rdd_2_74 could not be removed as it was not found on disk or in memory
23/03/19 20:41:20 WARN BlockManager: Putting block rdd_2_74 failed
23/03/19 20:41:21 WARN MemoryStore: Not enough space to cache rdd_2_70 in memory! (computed 11.6 MiB so far)
23/03/19 20:41:21 WARN B



23/03/19 20:41:23 WARN MemoryStore: Failed to reserve initial memory threshold of 1024.0 KiB for computing block rdd_2_85 in memory.
23/03/19 20:41:23 WARN MemoryStore: Not enough space to cache rdd_2_85 in memory! (computed 0.0 B so far)
23/03/19 20:41:23 WARN BlockManager: Block rdd_2_85 could not be removed as it was not found on disk or in memory
23/03/19 20:41:23 WARN BlockManager: Putting block rdd_2_85 failed
23/03/19 20:41:23 WARN MemoryStore: Not enough space to cache rdd_2_82 in memory! (computed 2.6 MiB so far)
23/03/19 20:41:23 WARN MemoryStore: Not enough space to cache rdd_2_79 in memory! (computed 4.6 MiB so far)
23/03/19 20:41:23 WARN BlockManager: Block rdd_2_79 could not be removed as it was not found on disk or in memory
23/03/19 20:41:23 WARN BlockManager: Block rdd_2_82 could not be removed as it was not found on disk or in memory
23/03/19 20:41:23 WARN BlockManager: Putting block rdd_2_79 failed
23/03/19 20:41:23 WARN BlockManager: Putting block rdd_2_82 failed
23



23/03/19 20:41:23 WARN BlockManager: Block rdd_2_84 could not be removed as it was not found on disk or in memory
23/03/19 20:41:23 WARN BlockManager: Putting block rdd_2_84 failed
23/03/19 20:41:23 WARN MemoryStore: Not enough space to cache rdd_2_81 in memory! (computed 4.5 MiB so far)
23/03/19 20:41:23 WARN BlockManager: Block rdd_2_81 could not be removed as it was not found on disk or in memory
23/03/19 20:41:23 WARN BlockManager: Putting block rdd_2_81 failed
23/03/19 20:41:24 WARN MemoryStore: Not enough space to cache rdd_2_78 in memory! (computed 11.4 MiB so far)
23/03/19 20:41:24 WARN BlockManager: Block rdd_2_78 could not be removed as it was not found on disk or in memory
23/03/19 20:41:24 WARN BlockManager: Putting block rdd_2_78 failed
23/03/19 20:41:25 WARN MemoryStore: Not enough space to cache rdd_2_80 in memory! (computed 11.4 MiB so far)
23/03/19 20:41:25 WARN BlockManager: Block rdd_2_80 could not be removed as it was not found on disk or in memory
23/03/19 20:41:25



23/03/19 20:41:26 WARN MemoryStore: Not enough space to cache rdd_2_89 in memory! (computed 3.6 MiB so far)
23/03/19 20:41:26 WARN BlockManager: Block rdd_2_89 could not be removed as it was not found on disk or in memory
23/03/19 20:41:26 WARN BlockManager: Putting block rdd_2_89 failed
23/03/19 20:41:26 WARN MemoryStore: Failed to reserve initial memory threshold of 1024.0 KiB for computing block rdd_2_93 in memory.
23/03/19 20:41:26 WARN MemoryStore: Not enough space to cache rdd_2_93 in memory! (computed 0.0 B so far)
23/03/19 20:41:26 WARN BlockManager: Block rdd_2_93 could not be removed as it was not found on disk or in memory
23/03/19 20:41:26 WARN BlockManager: Putting block rdd_2_93 failed
23/03/19 20:41:27 WARN MemoryStore: Not enough space to cache rdd_2_88 in memory! (computed 3.5 MiB so far)
23/03/19 20:41:27 WARN BlockManager: Block rdd_2_88 could not be removed as it was not found on disk or in memory
23/03/19 20:41:27 WARN BlockManager: Putting block rdd_2_88 failed
23



23/03/19 20:41:27 WARN MemoryStore: Not enough space to cache rdd_2_92 in memory! (computed 3.3 MiB so far)
23/03/19 20:41:27 WARN BlockManager: Block rdd_2_92 could not be removed as it was not found on disk or in memory
23/03/19 20:41:27 WARN BlockManager: Putting block rdd_2_92 failed
23/03/19 20:41:28 WARN MemoryStore: Not enough space to cache rdd_2_87 in memory! (computed 8.5 MiB so far)
23/03/19 20:41:28 WARN BlockManager: Block rdd_2_87 could not be removed as it was not found on disk or in memory
23/03/19 20:41:28 WARN BlockManager: Putting block rdd_2_87 failed
23/03/19 20:41:28 WARN MemoryStore: Not enough space to cache rdd_2_86 in memory! (computed 11.0 MiB so far)
23/03/19 20:41:28 WARN BlockManager: Block rdd_2_86 could not be removed as it was not found on disk or in memory
23/03/19 20:41:28 WARN BlockManager: Putting block rdd_2_86 failed
23/03/19 20:41:28 WARN MemoryStore: Not enough space to cache rdd_2_91 in memory! (computed 7.6 MiB so far)
23/03/19 20:41:28 WARN B



23/03/19 20:41:30 WARN MemoryStore: Not enough space to cache rdd_2_97 in memory! (computed 3.2 MiB so far)
23/03/19 20:41:30 WARN BlockManager: Block rdd_2_97 could not be removed as it was not found on disk or in memory
23/03/19 20:41:30 WARN BlockManager: Putting block rdd_2_97 failed
23/03/19 20:41:30 WARN MemoryStore: Not enough space to cache rdd_2_94 in memory! (computed 5.2 MiB so far)
23/03/19 20:41:30 WARN BlockManager: Block rdd_2_94 could not be removed as it was not found on disk or in memory
23/03/19 20:41:30 WARN BlockManager: Putting block rdd_2_94 failed
23/03/19 20:41:30 WARN MemoryStore: Not enough space to cache rdd_2_99 in memory! (computed 3.4 MiB so far)
23/03/19 20:41:30 WARN BlockManager: Block rdd_2_99 could not be removed as it was not found on disk or in memory
23/03/19 20:41:30 WARN BlockManager: Putting block rdd_2_99 failed




23/03/19 20:41:31 WARN MemoryStore: Not enough space to cache rdd_2_98 in memory! (computed 8.1 MiB so far)
23/03/19 20:41:31 WARN BlockManager: Block rdd_2_98 could not be removed as it was not found on disk or in memory
23/03/19 20:41:31 WARN BlockManager: Putting block rdd_2_98 failed
23/03/19 20:41:31 WARN MemoryStore: Not enough space to cache rdd_2_96 in memory! (computed 8.0 MiB so far)
23/03/19 20:41:31 WARN BlockManager: Block rdd_2_96 could not be removed as it was not found on disk or in memory
23/03/19 20:41:31 WARN BlockManager: Putting block rdd_2_96 failed
23/03/19 20:41:31 WARN MemoryStore: Not enough space to cache rdd_2_95 in memory! (computed 11.0 MiB so far)
23/03/19 20:41:31 WARN BlockManager: Block rdd_2_95 could not be removed as it was not found on disk or in memory
23/03/19 20:41:31 WARN BlockManager: Putting block rdd_2_95 failed


                                                                                

[('Fri', 17.648611453234945),
 ('Sun', 18.44242086262309),
 ('Sat', 17.879148288973383),
 ('Thu', 17.78658802556468),
 ('Tue', 17.80535680192016),
 ('Wed', 17.818479233465066),
 ('Mon', 17.983318652279234)]