### Preliminary note:
This notebook assumes that you are running it on the cluster. You can run it on a local machine, but take into account that you need to modify the addresses of zookeeper and the brokers


# Kafka

**Kafka** is a distributed publish-subscribe message system focused on high throughput. All messages ingested by Kafka are persisted on disk and are also replicated within a given cluster to garantee fault-tolerance and thus prevent data loss. However, Kafka relies on another service, **ZooKeeper** which offers Kafka the synchronisation information it needs to run properly in a distributed way.

The benefits of Kafka are thus : 

- *Reliability* as it is distributed, partitioned, replicated and offers fault tolerance.
- *Scalability* as we can either increase or descrease the actual size of a Kafka cluster on demand and without downtime to best fit the actual load.
- *Durabiliy* as all messages are persisted on disk.
- *Performance* as it is designed for High Throughput.

What can Kafka be used for ?

- *Log aggregation* even though other systems may even be more appropriate for that purpose now.
- *Metrics* as it can ingest vasts amounts of data, collecting metrics from multiple servers was a strong point for monitoring system even though, again, other systems do exist now for such scenarios.
- *Stream Processing* Now we are talking. Framework like Storm and Spark Streaming can take advantage of Kafka to process information in a streaming fashion thus offering us different means of making data analysis. Indeed, in MapReduce, you only do Batch-Processing, you cannot handle continuous influx of information without having to invest lots of time in automating your submissions were in Spark Streaming for instance, you can make an application run permanently and analysing data as it arrives.

# Kafka concepts

Using Kafka, you have to be familiar with the concepts it uses such as *topics*, *brokers*, *producers* and *consumers*.

![alt text](https://www.tutorialspoint.com/apache_kafka/images/fundamentals.jpg "Kafka Architecture with a replication factor of 3")

- **Topic** A stream of messages belonging to a particular category is called a topic. Data is stored in topics.
- **Partition** Topics are split into partitions. For each topic, Kafka keeps a minimum of one partition. Each such partition contains messages in an immutable ordered sequence. **Partition offset ** Each partitioned message has a unique sequence id called as offset.
- **Replicas** Replicas are nothing but backups of a partition. They are used to prevent data loss.
- **Broker** Brokers are worker processes that are responsible for maintaining the published data (accespting data form produces, serving it to consumers). Each broker may have zero or more partitions per topic. 
- **Kafka cluster** The set of all Kafka brokers. A Kafka cluster can be expanded without downtime. These clusters are used to manage the persistence and replication of message data.
- **Producers** are Publishers of messages
- **Consumers** are Consumers of messages
- **Leader** Node responsible for all reads and write for a given partition, thus, each partition has one broker acting as leader.
- **Follower** Node which just follows the **Leader** instructions. A Follower may become a Leader if the node attributed the role of Leader fails at some point. In practise, a Follower acts just as a consumer, consuming data from its Leader to maintain its own data store.
- **ZooKeeper** ZooKeeper is used for managing and coordinating Kafka broker. ZooKeeper service is mainly used to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system. As per the notification received by the Zookeeper regarding presence or failure of the broker then pro-ducer and consumer takes decision and starts coordinating their task with some other broker.


# Using Kafka from Command Line

## Starting Kafka

On your computer or on the VM, you need to start zookeeper and kafka as explained in the instruction file. Since Jupyter Notebooks do not support background processes, you have to run them from a terminal:


In [None]:
# $KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties 
# $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties 



## Listing Topics

In [1]:
# The kafka-topics command can be used to list available topics, create new ones, and delete existing ones
# we have to specify the host+port where zookeeper is running (since zookeeper keeps the metadata of all topics)
# note: we pipe stderr to /dev/null, otherwise you'll get lots of INFO log messages 
# (when kafka is installed on your machine this is normally not necessary)
!kafka-topics.sh --list --zookeeper localhost:2181 2>/dev/null

## Creating a topic

In [2]:
# Let's create a new topic.
#
# 
# Here we create a topic <username>.test1, which is split into 5 partitions, 
# which each partition replicated three times
#
# If you want to increase the replication factor, you have to start more than one instance of kafka brokers as
# explained in the instruction file.
!kafka-topics.sh --create --zookeeper localhost:2181 \
    --replication-factor 1 --partitions 5 --topic "$USER.test1" 2>/dev/null

Created topic bigdata.test1.


In [3]:
!kafka-topics.sh --list --zookeeper localhost:2181 2>/dev/null

bigdata.test1


## Removing a topic

In [4]:
!kafka-topics.sh --delete --zookeeper localhost:2181 --topic "$USER.test1" 2>/dev/null
!kafka-topics.sh --list --zookeeper localhost:2181 2>/dev/null 

Topic bigdata.test1 is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.


## Starting a producer - With file input

**Attention:** the following examples assumes that you have a folder "INFOH515/books" in your current working directory.
We created this folder and put example books in that folder in the 1st lab session. Only execute the following command if you deleted this folder+files in the meantime

In [5]:
!mkdir ./INFOH515
!mkdir ./INFOH515/books

!wget --quiet http://www.gutenberg.org/cache/epub/20417/pg20417.txt -O ./INFOH515/books/pg20417.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20418/pg20418.txt -O ./INFOH515/books/pg20418.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20419/pg20419.txt -O ./INFOH515/books/pg20419.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20420/pg20420.txt -O ./INFOH515/books/pg20420.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20421/pg20421.txt -O ./INFOH515/books/pg20421.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20422/pg20422.txt -O ./INFOH515/books/pg20422.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20423/pg20423.txt -O ./INFOH515/books/pg20423.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20424/pg20424.txt -O ./INFOH515/books/pg20424.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20425/pg20425.txt -O ./INFOH515/books/pg20425.txt
!wget --quiet http://www.gutenberg.org/cache/epub/20426/pg20426.txt -O ./INFOH515/books/pg20426.txt
!echo "Books downloaded in ./INFOH515/books" 

Books downloaded in ./INFOH515/books


In [6]:
# First re-create the topic
!kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic "$USER.test1" 2>/dev/null

# Now publish messages on this topic
# kafka-console-producer is a shell command that reads from stdin and published every line read
# on the specified topic as a separate message
# to use it, we need to specify the address of at least one kafka broker in our cluster 
# (in our case: localhost at port 9092, see the $KAFKA_HOME/config/server.properties file)
!kafka-console-producer.sh --broker-list localhost:9092 --topic "$USER.test1" < ./INFOH515/books/pg20417.txt 2>/dev/null

Created topic bigdata.test1.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

## Starting a consumer - From the beginning

In [7]:
# Now let's read all message published on the topic, starting from the beginning
# kafka-console-consumer reads from the topic, and prints every message on std out
# again, we need to specify the address of at least one kafka broker in our cluster
# (in this notebook we are using only one broker)
# the --from-beginning flag means that we start readding from the beginning of the stream, 
# instead of reading from the end (waiting for new items to arrive)
#
# NOTE: this command will keep waiting for new messages to appear on the kafka topic and will 
#       hence never terminate. 
#       You'll need to cancel it (click on  "Kernel" the menu bar, and then on "interrupt") 
#       before going to the next one
#
# NOTE: If you look at the text that is output, you will see that some sentences may appear in an incorrect order
#       This is due to the fact that we partitioned the topic into 5 partitions and we read from all 5 of 
#       these partititions in parallel. While the order is preserved inside a paritition, it is not preserved
#       across partitions
!kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic "$USER.test1" --from-beginning 2>/dev/null

  WILD OR FERAL                                                  216
    Photo: W. S. Berridge, F.Z.S.

WOODPECKER HAMMERING AT A COTTON-REEL, ATTACHED TO A TREE        217

THE BEAVER                                                       220

THE THRUSH AT ITS ANVIL                                          221
    Photo: F. R. Hinkins & Son.

ALSATIAN WOLF-DOG                                                226
    Photo: Lafayette.

THE POLAR BEAR OF THE FAR NORTH                                  227
    Photo: W. S. Berridge.

AN ALLIGATOR "YAWNING" IN EXPECTATION OF FOOD                    227
    From the Smithsonian Report, 1914.

BABY ORANG                                                       232
    Photo: W. P. Dando.

ORANG-UTAN                                                       232
    Photo: Gambier Bolton.

CHIMPANZEE                                                       233
    Photo: James's Press Agency.

BABY ORANG-UTAN                   

THE STORY OF EVOLUTION




INTRODUCTORY

THE BEGINNING OF THE EARTH--MAKING A HOME FOR LIFE--THE FIRST LIVING
CREATURES


§ 1

The Evolution-idea is a master-key that opens many doors. It is a
luminous interpretation of the world, throwing the light of the past
upon the present. Everything is seen to be an antiquity, with a history
behind it--a _natural history_, which enables us to understand in some
measure how it has come to be as it is. We cannot say more than
"understand in some measure," for while the _fact_ of evolution is
certain, we are only beginning to discern the _factors_ that have been
at work.

The evolution-idea is very old, going back to some of the Greek
philosophers, but it is only in modern times that it has become an
essential part of our mental equipment. It is now an everyday
intellectual tool. It was applied to the origin of the solar system and
to the making of the earth before it was applied to plants and animals;
it was extended fro

    "The lands in the Cenozoic began to bloom with more and more
    flowering plants and grand hardwood forests, the atmosphere is
    scented with sweet odours, a vast crowd of new kinds of insects
    appear, and the places of the once dominant reptiles of the lands
    and seas are taken by the mammals. Out of these struggles there
    rises a greater intelligence, seen in nearly all of the mammal
    stocks, but particularly in one, the monkey-ape-man. Brute man
    appears on the scene with the introduction of the last glacial
    climate, a most trying time for all things endowed with life, and
    finally there results the dominance of reasoning man over all his
    brute associates."

In man and human society the story of evolution has its climax.


The Ascent of Man

Man stands apart from animals in his power of building up general ideas
and of using these in the guidance of his behaviour and the control of
his conduct. This is essentially wrapped up with h


We confess to being greatly encouraged by the reception that has been
given to the English serial issue of "The Outline of Science." It has
been very hearty--we might almost say enthusiastic. For we agree with
Professor John Dewey, that "the future of our civilisation depends upon
the widening spread and deepening hold of the scientific habit of mind."
And we hope that this is what "The Outline of Science" makes for.
Information is all to the good; interesting information is better still;
but best of all is the education of the scientific habit of mind.
Another modern philosopher, Professor L. T. Hobhouse, has declared that
the evolutionist's mundane goal is "the mastery by the human mind of the
conditions, internal as well as external, of its life and growth." Under
the influence of this conviction "The Outline of Science" has been
written. For life is not for science, but science for life. And even
more than science, to our way of thinking, is the individual developmen

A very simple animal accumulates a little store of potential energy, and
it proceeds to expend this, like an explosive, by acting on its
environment. It does so in a very characteristic self-preservative
fashion, so that it burns without being consumed and explodes without
being blown to bits. It is characteristic of the organism that it
remains a going concern for a longer or shorter period--its length of
life. Living creatures that expended their energy ineffectively or
self-destructively would be eliminated in the struggle for existence.
When a simple one-celled organism explores a corner of the field seen
under a microscope, behaving to all appearance very like a dog scouring
a field seen through a telescope, it seems permissible to think of
something corresponding to mental endeavour associated with its
activity. This impression is strengthened when an amoeba pursues
another amoeba, overtakes it, engulfs it, loses it, pursues it again,
recaptures it, and so on. What 

of enormous pressure--2-1/2 tons on the square inch at a depth of 2,500
fathoms; of profound calm, unbroken silence, immense monotony. And as
there are no plants in the great abysses, the animals must live on one
another, and, in the long run, on the rain of moribund animalcules which
sink from the surface through the miles of water. It seems a very
unpromising haunt of life, but it is abundantly tenanted, and it gives
us a glimpse of the insurgent nature of the living creature that the
difficulties of the Deep Sea should have been so effectively conquered.
It is probable that the colonising of the great abysses took place in
relatively recent times, for the fauna does not include many very
antique types. It is practically certain that the colonisation was due
to littoral animals which followed the food-débris, millennium after
millennium, further and further down the long slope from the shore.


The Freshwaters

4. A fourth haunt of life is that of the freshwaters, in

were taken by man's ancestors "during the long night of the geological
past." On the sides of the neck of the human embryo there are four pairs
of slits, the "visceral clefts," openings from the beginning of the
food-canals to the surface. There is no doubt as to their significance.
They correspond to the gill-slits of fishes and tadpoles. Yet in
reptiles, birds, and mammals they have no connection with breathing,
which is their function in fishes and amphibians. Indeed, they are not
of any use at all, except that the first becomes the Eustachian tube
bringing the ear-passage into connection with the back of the mouth, and
that the second and third have to do with the development of a curious
organ called the thymus gland. Persistent, nevertheless, these
gill-slits are, recalling even in man an aquatic ancestry of many
millions of years ago.

When all these lines of evidence are considered, they are seen to
converge in the conclusion that man is derived from a simian sto

material particles. X-rays were found to be a new variety of _light_
with a remarkable power of penetration. We have seen what the
spectroscope reveals about the varying nature of light wave-lengths.
Light-waves are set up by vibrations in ether,[2] and, as we shall see,
these ether disturbances are all of the same kind; they only differ as
regards wave-lengths. The X-rays which Röntgen discovered, then, are
light, but a variety of light previously unknown to us; they are ether
waves of very short length. X-rays have proved of great value in many
directions, as all the world knows, but that we need not discuss at this
point. Let us see what followed Röntgen's discovery.

    [2] We refer throughout to the "ether" because, although modern
    theories dispense largely with this conception, the theories of
    physics are so inextricably interwoven with it that it is necessary,
    in an elementary exposition, to assume its existence. The modern
    view will be explained 

still unknown on earth, which has been christened Coronium.


Measuring the Speed of Light

But this is not all; soon a new use was found for the spectroscope. We
found that we could measure with it the most difficult of all speeds
to measure, speed in the line of sight. Movement at right angles to the
direction in which one is looking is, if there is sufficient of it, easy
to detect, and, if the distance of the moving body is known, easy to
measure. But movement in the line of vision is both difficult to detect
and difficult to measure. Yet, even at the enormous distances with which
astronomers have to deal, the spectroscope can detect such movement and
furnish data for its measurement. If a luminous body containing, say,
sodium is moving rapidly towards the spectroscope, it will be found that
the sodium lines in the spectrum have moved slightly from their usual
definite positions towards the violet end of the spectrum, the amount of
the change of position increasing 

heat.]

[Illustration: THE VARIABLE MONITOR (_Varanus_)

The monitors are the largest of existing lizards, the Australian species
represented in the photograph attaining a length of four feet. It has a
brown colour with yellow spots, and in spite of its size it is not
conspicuous against certain backgrounds, such as the bark of a tree.]


§ 2

Gradual Change of Colour

The common shore-crab shows many different colours and mottlings,
especially when it is young. It may be green or grey, red or brown, and
so forth, and it is often in admirable adjustment to the colour of the
rock-pool where it is living. Experiments, which require extension, have
shown that when the crab has moulted, which it has to do very often when
it is young, the colour of the new shell tends to harmonise with the
general colour of the rocks and seaweed. How this is brought about, we
do not know. The colour does not seem to change till the next moult, and
not then unless there is some reason f

crest to crest. The distance from crest to crest of the ripples in a
pond is sometimes no more than an inch or two. This distance is
enormously great compared to the longest of the wave-lengths that
constitute light. We say the longest, for the waves of light differ in
length; the colour depends upon the length of the light. Red light has
the longest waves and violet the shortest. The longest waves, the waves
of deep-red light, are seven two hundred and fifty thousandths of an
inch in length (7/250,000 inch). This is nearly twice the length of
deep-violet light-waves, which are 1/67,000 inch. But light-waves, the
waves that affect the eye, are not the only waves carried by the ether.
Waves too short to affect the eye can affect the photographic plate, and
we can discover in this way the existence of waves only half the length
of the deep-violet waves. Still shorter waves can be discovered, until
we come to those excessively minute rays, the X-rays.


Below the Limits of

  LARVÆ, WHICH LIVE THERE                                        191

AVOCET'S BILL, ADAPTED FOR A CURIOUS SIDEWAYS SCOOPING IN
  THE SHORE-POOLS AND CATCHING SMALL ANIMALS                     191

HORNBILL'S BILL, ADAPTED FOR EXCAVATING A NEST IN A TREE,
  AND ALSO FOR SEIZING AND BREAKING DIVERSE FORMS OF FOOD,
  FROM MAMMALS TO TORTOISES, FROM ROOTS TO FRUITS                191

FALCON'S BILL, ADAPTED FOR SEIZING, KILLING, AND TEARING
  SMALL MAMMALS AND BIRDS                                        191

PUFFIN'S BILL, ADAPTED FOR CATCHING SMALL FISHES NEAR THE
  SURFACE OF THE SEA, AND FOR HOLDING THEM WHEN CAUGHT AND
  CARRYING THEM TO THE NEST                                      191

LIFE-HISTORY OF A FROG                                           192

HIND-LEG OF WHIRLIGIG BEETLE WHICH HAS BECOME BEAUTIFULLY
  MODIFIED FOR AQUATIC LOCOMOTION                                192
    Photo: J. J. Ward, F.E.S.

THE BIG ROBBER-CRAB (_Birgus Latro_), THAT CLIMBS T

    It was a country of swamps, low forests of birch, alder, and willow,
    fertile meadows, and snow-capped mountains. Its estuaries penetrated
    further inland than they now do, and the sea stood at the level of
    the Fifty-Foot Beach. On its plains and in its forests roamed many
    creatures which are strange to the fauna of to-day--the Elk and the
    Reindeer, Wild Cattle, the Wild Boar and perhaps Wild Horses, a
    fauna of large animals which paid toll to the European Lynx, the
    Brown Bear and the Wolf. In all likelihood, the marshes resounded to
    the boom of the Bittern and the plains to the breeding calls of the
    Crane and the Great Bustard.

Such is Dr. Ritchie's initial picture.

[Illustration: LIFE-HISTORY OF A FROG

1, Before hatching; 2, newly hatched larvæ hanging on to water-weed; 3,
with external gills; 4, external gills are covered over and are
absorbed; 5, limbless larva about a month old with internal gills; 6,
tadpole with hind-leg

^C


## Describing a topic

In [8]:
!kafka-topics.sh --describe --zookeeper localhost:2181 --topic "$USER.test1" 2>/dev/null

Topic: bigdata.test1	PartitionCount: 5	ReplicationFactor: 1	Configs: 
	Topic: bigdata.test1	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: bigdata.test1	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: bigdata.test1	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: bigdata.test1	Partition: 3	Leader: 0	Replicas: 0	Isr: 0
	Topic: bigdata.test1	Partition: 4	Leader: 0	Replicas: 0	Isr: 0


### Consuming a topic from a given offset
We can also start consuming a topic at a specific offset. If you run the following command, you'll see that the first 10 lines are missing

In [9]:
# Interrupt the execution of this cell if it takes too long!
!kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic $USER.test1 --partition 0 --offset 10 2>/dev/null

ALSATIAN WOLF-DOG                                                226
    Photo: Lafayette.

THE POLAR BEAR OF THE FAR NORTH                                  227
    Photo: W. S. Berridge.

AN ALLIGATOR "YAWNING" IN EXPECTATION OF FOOD                    227
    From the Smithsonian Report, 1914.

BABY ORANG                                                       232
    Photo: W. P. Dando.

ORANG-UTAN                                                       232
    Photo: Gambier Bolton.

CHIMPANZEE                                                       233
    Photo: James's Press Agency.

BABY ORANG-UTAN                                                  233
    Photo: James's Press Agency.

ORANG-UTAN                                                       233
    Photo: James's Press Agency.

BABY CHIMPANZEES                                                 233
    Photo: James's Press Agency.

CHIMPANZEE                                                       238
  

has a diameter of no less than eight feet four inches.

[Illustration: THE YERKES 40-INCH REFRACTOR

(The largest _refracting_ telescope in the world. Its big lens weighs
1,000 pounds, and its mammoth tube, which is 62 feet long, weighs about
12,000 pounds. The parts to be moved weigh approximately 22 tons.

The great _100-inch reflector_ of the Mount Wilson reflecting
telescope--the largest _reflecting_ instrument in the world--weighs
nearly 9,000 pounds and the moving parts of the telescope weigh about
100 tons.

The new _72-inch reflector_ at the Dominion Astrophysical Observatory,
near Victoria, B. C., weighs nearly 4,500 pounds, and the moving parts
about 35 tons.)]

[Illustration: _Photo: H. J. Shepstone._

THE DOUBLE-SLIDE PLATE HOLDER ON YERKES 40-INCH REFRACTING TELESCOPE

The smaller telescope at the top of the picture acts as a "finder"; the
field of view of the large telescope is so restricted that it is
difficult to recognise, as it were, the part of

moles, freshwater mammals, like duckmole and beaver, shore-frequenting
seals and manatees, and open-sea cetaceans, some of which dive far more
than full fathoms five. It is important to realise the perennial
tendency of animals to conquer every corner and to fill every niche of
opportunity, and to notice that this has been done by successive sets of
animals in succeeding ages. _Most notably the mammals repeat all the
experiments of reptiles on a higher turn of the spiral._ Thus arises
what is called convergence, the superficial resemblance of unrelated
types, like whales and fishes, the resemblance being due to the fact
that the different types are similarly adapted to similar conditions of
life. Professor H. F. Osborn points out that mammals may seek any one of
the twelve different habitat-zones, and that in each of these there may
be six quite different kinds of food. Living creatures penetrate
everywhere like the overflowing waters of a great river in flood.


§ 3


^C


## Producing through Python

In [10]:
!kafka-topics.sh --delete --zookeeper localhost:2181 --topic "$USER.test1" 2>/dev/null
!kafka-topics.sh --list --zookeeper localhost:2181 2>/dev/null 
!kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic "$USER.test1" 2>/dev/null

Topic bigdata.test1 is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
__consumer_offsets
Created topic bigdata.test1.


In [11]:

import os
import pwd
from kafka import KafkaProducer

topic = pwd.getpwuid( os.getuid() )[ 0 ] + ".test1"
producer = KafkaProducer(bootstrap_servers=["localhost:9092"])

producer.send(topic, b'Hello world !!!')
producer.send(topic, b'Ok, go check the consumer !')
producer.send(topic, b'Bye!')
print("Messsage sent to topic : "+topic)

# you can also send messages as key/value pairs. 
# It is ensured that all messages with the same key (if it is not None) will end up in the same partition.
# The key needs to be of type bytes or bytearray 
producer.send(topic, key=b"some_key_1", value=b'Hello world with a key!')
producer.flush()
producer.close()

Messsage sent to topic : bigdata.test1


## Consuming through Python

In [12]:
# Note : this script keeps running waiting for new messages. Interrupt the kernel to abort it!
import os
import pwd
from kafka import KafkaConsumer

topic = pwd.getpwuid( os.getuid() )[ 0 ] + ".test1"

# if you do not specify auto_offset_reset, you resume consuming where you left off last time
consumer = KafkaConsumer(topic, bootstrap_servers=["localhost:9092"], auto_offset_reset='earliest')

print("Waiting for data to consume from topic %s..." % topic)

for message in consumer:
    print(message.topic, message.partition, message.key, message.value)
    
# If you prefer to consume JSON messages that could be more practical
# KafkaConsumer(value_deserializer=lambda m: json.loads(m.decode('ascii')))      

# If you desire to consume a specific pattern of topic
# consumer = KafkaConsumer()
# consumer.subscribe(pattern='^awesome.*')

Waiting for data to consume from topic bigdata.test1...
bigdata.test1 4 None b'Ok, go check the consumer !'
bigdata.test1 4 None b'Bye!'
bigdata.test1 2 None b'Hello world !!!'
bigdata.test1 1 b'some_key_1' b'Hello world with a key!'


KeyboardInterrupt: 

# Exercises

1. Create a new Kafka topic named `<userid>.books` where `<userid>` is your netid
2. Write a Python script that will read all books downloaded previously and store them in the kafka topic `<userid>.books` where `userid` is your netid. Make sure that all lines of the same book are put in the same topic partition. (Hint: use the fact that messages with the same key are put in the same partition.)
3. Create a new Kafka topic names `<userid>.book-count` where <userid> is your netid
4. Write a python script that will read from the Kafka `<userid>.books` topic and count the number of lines in each book. Whenever a count is updated, it publishes a message `(bookname, current_count)` to `<userid>.book-count`

*Important*: for the last python script, pass the options `auto_offset_reset='earliest'` and `enable_auto_commit=False` when creating the KafkaConsumer object, this ensures that you will be able to re-read the stream if you re-execute the python stream multiple times.