In [1]:
import warnings
warnings.filterwarnings('ignore')

Sentence_Transfomer Model Architecture
Transformer Models: The library utilizes transformer models like BERT, RoBERTa, DistilBERT, etc., which are pre-trained on large corpora and can be fine-tuned for specific tasks.
Pooling Strategies: After processing input text through a transformer, a pooling layer aggregates token embeddings to form a single embedding for the entire sentence or text. Common strategies include mean, max, and CLS token pooling.
Applications
Semantic Textual Similarity: By embedding sentences into high-dimensional vectors, the library can be used to compare and compute similarities between texts, useful for tasks such as duplicate detection or information retrieval.
Clustering and Classification: Sentence embeddings can be used for text classification or clustering, grouping similar texts together based on their semantic content.



**Using all-MiniLM-L6-v2 sentence-transformers model that maps sentences to a 384 dimensional dense vector space.

In [2]:
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from pinecone import Pinecone, ServerlessSpec
import os
import time
import torch
from tqdm.auto import tqdm
from dotenv import load_dotenv

In [3]:
load_dotenv()
dataset = load_dataset('quora', split='train[240000:290000]')

In [7]:
dataset[:1]

{'questions': [{'id': [207550, 351729],
   'text': ['What is the truth of life?', "What's the evil truth of life?"]}],
 'is_duplicate': [False]}

In [8]:
questions = []
for record in dataset['questions']:
    questions.extend(record['text'])
question = list(set(questions))
print('\n'.join(questions[:10]))
print('-' * 50)
print(f'Number of questions: {len(questions)}')

What is the truth of life?
What's the evil truth of life?
Which is the best smartphone under 20K in India?
Which is the best smartphone with in 20k in India?
Steps taken by Canadian government to improve literacy rate?
Can I send homemade herbal hair oil from India to US via postal or private courier services?
What is a good way to lose 30 pounds in 2 months?
What can I do to lose 30 pounds in 2 months?
Which of the following most accurately describes the translation of the graph y = (x+3)^2 -2 to the graph of y = (x -2)^2 +2?
How do you graph x + 2y = -2?
--------------------------------------------------
Number of questions: 100000


In [10]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
if device != 'cuda':
    print('No cuda.')
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

No cuda.


In [13]:
q="Which city has most engineers in USA?"
q_embed=model.encode(q)
q_embed.shape

(384,)

In [16]:
PINECONE_API_KEY = os.environ['PINECONE_API_KEY']

For Below code:

    create_index is a method that initializes a new vector index. 
    
    Parameters of create_index
        name=INDEX_NAME: This specifies the name of the index. This name is used to reference the index in subsequent operations, such as adding or querying vectors.

        dimension=model.get_sentence_embedding_dimension(): This sets the dimensionality of the vectors that will be stored in the index. 
        
        metric='cosine': This defines the metric used to measure similarity between vectors in the index. The cosine similarity metric measures the cosine of the angle between two vectors, providing a value that indicates how similar they are in orientation in the vector space, regardless of their magnitude. 
     

        spec=ServerlessSpec(cloud='aws', region='us-west-2'): This parameter specifies the serverless configuration for the index:

        ServerlessSpec: Configuration structure in the Pinecone library that defines how the index should be deployed.
            cloud='aws': hosted on Amazon Web Services (AWS).
            region='us-west-2': AWS region where the index will be physically created. 

In [20]:
pc=Pinecone(api_key=PINECONE_API_KEY)
INDEX_NAME="semanticsearchpinecone1"
if INDEX_NAME in [index.name for index in pc.list_indexes()]:
    pc.delete_index(INDEX_NAME)
    
print(INDEX_NAME)

semanticsearchpinecone1


In [21]:
pc.create_index(name=INDEX_NAME, 
                dimension=model.get_sentence_embedding_dimension(), 
                metric='cosine', 
                spec=ServerlessSpec(cloud='aws', region='us-west-2'))

In [23]:
index = pc.Index(INDEX_NAME)
print(index)

<pinecone.data.index.Index object at 0x1366d3b50>


create embeddings and upsert.....

In [27]:
batch=250
q_vec_limit=10000
questions = question[:q_vec_limit]

import json

for i in tqdm(range(0, len(questions), batch)):
    i_end = min(i+batch, len(questions))
    ids = [str(x) for x in range(i, i_end)]
    metadatas = [{'text': text} for text in questions[i:i_end]]
    print(len(ids), ' - '*10,  metadatas)
    print("-"*50)

  2%|▎         | 1/40 [00:01<00:39,  1.00s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': "What is the Secret Service's nickname for Trump?"}, {'text': 'Should we really worry about climate change?'}, {'text': "What does the word 'somberi' mean in Tamil?"}, {'text': 'Does 18 month are enough to crack IIT JEE?'}, {'text': 'Do teenage guys fall in love?'}, {'text': 'What did the German soldiers of WWII think of British, US, Canadian, and Soviet soldiers?'}, {'text': 'What is the chemical formula for zinc? How is this determined?'}, {'text': "I'm a 5 letter word. I am normally below you. If you remove my 1st letter, you'll find me above you. If you remove my 1st & 2nd letters, you can't see me. What am I?"}, {'text': 'Survivor TV show: when going to tribal council, how far do the players walk? They often leave during daylight and get there in the dark?'}, {'text': 'Is Google Glass a flop?'}, {'text': "Who are the likely members of Donald Trump's cabinet?"}, {'text': 'What is an OTG pen drive?'}, {'text': "Should I get a fully paid e

  5%|▌         | 2/40 [00:01<00:36,  1.03it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How would world history change if the Earth had an obliquity/tilt of 0 degrees?'}, {'text': 'Were playstation 1 games written mainly in C or MIPS assembly?'}, {'text': 'Dating and Relationships: If a girl likes me, but she blocked me on Facebook, MSN, and refused to reply to my texts, what does it mean?'}, {'text': 'Is there a way of disproving the conviction that reality is merely a construct of our mind and essentially a figment of our imagination?'}, {'text': "Why do people mark questions as ambiguous or needing editing on Quora when they're fine?"}, {'text': 'What are the best reference books for learning Java?'}, {'text': 'What is the fees of luxury and brand management course from SP Jain school of global management?'}, {'text': 'Who will be the next Chief Minister of Maharashtra?'}, {'text': 'Why the "GST\' bill is not getting passed?'}, {'text': "How do you delete an instagram account when you don't remember the password or username

  8%|▊         | 3/40 [00:02<00:31,  1.16it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Why is Frederiksberg not a part of Copenhagen?'}, {'text': 'Why are Japan and the U.S. so excited in the world to hold the fake UN backed Tribunal verdict tightly to incite Philippine against China ?'}, {'text': 'Why is Detroit in such ruin, and is it really that bad to live there, or is it just media hype?'}, {'text': 'What is the worst problem of the United States today?'}, {'text': "I can't view my friend's profile on Facebook but I can still message her, what does it mean?"}, {'text': 'Why does Google Chrome keep crashing in Windows 7?'}, {'text': 'Why is time slower near heavy objects?'}, {'text': 'What is Best book for biology class 9?'}, {'text': 'Do INTPs make good poets?'}, {'text': 'Before getting blocked I sent a message in WhatsApp and it showed two ticks at that time. Can I see if the person has seen the message or not?'}, {'text': 'What are distinct symptoms of borderline personality disorder?'}, {'text': 'Which is the best ga

 10%|█         | 4/40 [00:03<00:30,  1.17it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What steps one must take to become an actor in bollywood?'}, {'text': 'What is the greatest NBA basketball team of all time?'}, {'text': "What's your favorite Chinese food?"}, {'text': 'How do I develop a strong attitude and personality?'}, {'text': 'Can I clear IAS if I scored low in B.E & GMAT?'}, {'text': 'As a programmer, what tasks have you automated to make your everyday life easier?'}, {'text': 'Which kind of insurances does a co-working space need?'}, {'text': "Who's the most powerful in the Lord of the Rings?"}, {'text': 'What are the best home exercises to lose weight?'}, {'text': 'What is a moojuvani (మూజువాణి) vote in Assembly?'}, {'text': 'How much do you love your sleep?'}, {'text': 'What is the dating culture like at UC Berkeley?'}, {'text': "Do employees at Lowe's have a good work-life balance? Does this differ across positions and departments?"}, {'text': 'What is SORTEX rice?'}, {'text': 'How do I get wavy hair overnight?'

 12%|█▎        | 5/40 [00:04<00:28,  1.21it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Where can I get help with voice acting?'}, {'text': 'How to succeed in exams?'}, {'text': 'Can a person with a higher IQ have a lower problem solving ability than someone with a lower IQ?'}, {'text': 'What are my birds thinking?'}, {'text': 'How do I recover a forgotten Gmail password?'}, {'text': "Why don't people who lived in the 1950's have a 1950's accent?"}, {'text': 'Who is Sam Smith?'}, {'text': 'Are UFOs a conspiracy theory?'}, {'text': 'What is the difference between your head voice and your falsetto?'}, {'text': 'What are the advantages of database?'}, {'text': "What is professor Thomas Cormen's favorite data-structure?"}, {'text': 'Can Quora help you get laid?'}, {'text': 'What are some of the best SAT prep books to use?'}, {'text': 'I am planning to drive from Vancouver to Toronto end of October 2016 in a small sized car. Should I put on winter tires? Kindly advise.'}, {'text': 'What do you think about DotA/DotA2?'}, {'text': 'W

 15%|█▌        | 6/40 [00:05<00:29,  1.15it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What are the best 5 inch to 5.2 inch android smartphones in India till september 2016 under 15k?'}, {'text': 'What are the symptoms of borderline personality disorder?'}, {'text': "What does a senior analyst/associate's average work day comprise of at Morgan Stanley, Barclay's, Credit Suisse and the like in the US?"}, {'text': 'Did you ever write a poem as a casual writer and felt like showing it to the world, just for fun?'}, {'text': 'I cant request Uber, gives "error processing request"?'}, {'text': 'Is being transgender considered a mental illness? How do we know?'}, {'text': 'How long does it take Yelp to add a business listing to its search results once it is created?'}, {'text': 'What is the procedure for admission of international students to ITMO?'}, {'text': 'What is the capital of Mexico?'}, {'text': 'Have the Japanese forgiven the USA for the atomic bombings of Hiroshima and Nagasaki?'}, {'text': 'How can I download mp3 songs di

 18%|█▊        | 7/40 [00:06<00:28,  1.15it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Which ETL tools do you prefer and why?'}, {'text': 'What are the advantages and disadvantages of medical technology?'}, {'text': 'What is Puerto Rico?'}, {'text': 'Which certification has more demand in networking field CCIE Data centre or CCIE Security?'}, {'text': 'You slam on the brakes of your car in a panic, and skid a certain distance on a straight level road. If you had been traveling twice as fast, what distance would the car have skidded, under the same conditions?'}, {'text': 'Is it possible that I have depression?'}, {'text': 'What is the primary industry? What are some examples?'}, {'text': 'Where can a person with a mathematics degree work?'}, {'text': 'What do the Chinese think of Russia?'}, {'text': 'How can you make a variable number of variables in python?'}, {'text': 'How can I use Tinder on my Windows desktop?'}, {'text': 'If Michael Bloomberg ran for president in 2016, would he have won?'}, {'text': 'Could Hillary Clinto

 20%|██        | 8/40 [00:07<00:28,  1.12it/s]

--------------------------------------------------


 22%|██▎       | 9/40 [00:07<00:27,  1.14it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What causes soil erosion?'}, {'text': 'What does spirituality mean?'}, {'text': 'How can I send bulk SMS from a computer?'}, {'text': 'What is the missing item from the pattern?'}, {'text': 'Do I have a chance of getting a PhD in Cognitive Neuroscience as an applied linguistic master student who is currently working on language processing?'}, {'text': 'What is it like to be an Indian living in South Korea?'}, {'text': 'What is the full form of PhD?'}, {'text': 'What does a woman really mean when she says someone is desperate?'}, {'text': 'How do I shuffle two arrays into one array?'}, {'text': 'What can I do to get my vagina wet?'}, {'text': 'How do I delete thousands of old unread emails from my Gmail?'}, {'text': 'Why are firearms still so loud? Is it illegal to manufacture guns with built-in silencers?'}, {'text': 'Why the rain drops obtain spherical shape?'}, {'text': 'At what time should I drink green tea to be fit?'}, {'text': 'How ca

 25%|██▌       | 10/40 [00:08<00:26,  1.14it/s]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What is the craziest story about North Korea?'}, {'text': 'What impact would GST have on agriculture sector?'}, {'text': 'By what framework may I take the application lock on iphone?'}, {'text': 'What is the way to clear eclipse console using Java?'}, {'text': 'What is the difference between attachment and love?'}, {'text': 'What is the difference between computer science, computer engineering, and software engineering?'}, {'text': 'If the Earth lost its oxygen for 5 seconds what would happen upon its return?'}, {'text': "I'm in love with my best friend, should I tell him I love him?"}, {'text': 'What are the different ways to earn money?'}, {'text': 'Which is best time to go for walk?'}, {'text': 'Is a good personal trainer worth it?'}, {'text': 'What is Bulletproof Coffee good for?'}, {'text': 'How do you push yourself beyond your limitations?'}, {'text': 'What will happen to India if Donald Trump wins?'}, {'text': 'How do I ask a girl ou

 28%|██▊       | 11/40 [00:11<00:42,  1.46s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'I am almost 500 pounds. Most of the weight is body fat. How do I go about using a skin caliper to measure body fat percentage?'}, {'text': 'Is U-Verse a cable or satellite provider?'}, {'text': 'Why do some people identify as different from their birth gender and do psychiatrists accept this as normal?'}, {'text': 'What is the small bottle of oil I got with my trimmer used for?'}, {'text': 'How do I revise for my theory test?'}, {'text': 'What was the most important event in history from the past 70 years?'}, {'text': 'What is the best method to obtain business credit? How do I avoid using my own credit?'}, {'text': 'Why are art and science equally important?'}, {'text': 'Can a J2EE developer get a job abroad?'}, {'text': 'When do the paralympic games start?'}, {'text': 'Why do people fear change?'}, {'text': 'What is the difference between the White House and the US Capitol Building?'}, {'text': 'What are examples of academic fraud?'}, {'t

 30%|███       | 12/40 [00:12<00:39,  1.41s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'In Return of the Jedi, what did Darth Vader die from?'}, {'text': 'What is it like to work at Wipro?'}, {'text': 'How do you cook brisket in the oven?'}, {'text': 'How long does it take to become director at McKinsey & Company? How much would the compensation be?'}, {'text': 'What is this steel object?'}, {'text': 'What is the Welding lap length for TMT Fe 500 for RCC Column as per IS Code?'}, {'text': 'What are the best books about network basics?'}, {'text': 'Which is a suitable inpatient drug and alcohol rehab center near Randolph County AL?'}, {'text': 'What can cause green phlegm with blood?'}, {'text': 'What is Arc?'}, {'text': 'How can I become a more attractive girl?'}, {'text': 'Do you entertain yourself?'}, {'text': 'How long does it take to film an episode of TV drama?'}, {'text': 'What is a good undergraduate GPA for getting into a top graduate program (something below 3.9/3.8)?'}, {'text': "Racism is the belief that one's race 

 32%|███▎      | 13/40 [00:13<00:32,  1.21s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Is anyone allergic to humans?'}, {'text': 'What hotel in Yelagiri Hill-station would be safe for unmarried couples, without the harassment of police, hotel staff, and moral police?'}, {'text': 'How competitive is the hiring process at Google?'}, {'text': 'How do I implement all my ideas?'}, {'text': 'How long does it take Tylenol to be effective?'}, {'text': "What should Trump's new campaign slogan be?"}, {'text': 'How do I control my anger and have patience?'}, {'text': 'What is the best web radio streaming open source script?'}, {'text': 'What do you mean by liberal place or liberal society?'}, {'text': 'How much would you pay for a 400 joke ebook for kids?'}, {'text': 'Is there a popular website for buying/selling stuff in Romania (like eBay)?'}, {'text': 'Who is a better Person for office Hillary of Donald?'}, {'text': "What mistakes did you find in the movie 'Dangal'?"}, {'text': 'Why have the Jews chosen to be great?'}, {'text': 'Can 

 35%|███▌      | 14/40 [00:14<00:29,  1.13s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How do you stop eating too much?'}, {'text': 'How can I calculate the atomic mass of an element?'}, {'text': 'Who first started NGO?'}, {'text': 'Is 50 Shades of Grey BDSM?'}, {'text': 'How do I reduce my weight within 7 days?'}, {'text': 'Why are beaches in India so dirty?'}, {'text': 'How should I crack NEST?'}, {'text': 'How do Indian Muslims justify Polygamy & Triple Talaq?'}, {'text': 'What are the tips to enjoy in a hostel?'}, {'text': 'Is it possible to hack WhatsApp?'}, {'text': 'Should I worry about what people think about me?'}, {'text': 'Why are resistors required in electronic circuits?'}, {'text': 'Does it matter if a husband is younger than his wife?'}, {'text': 'Which is the best method for Mathematics self study, up to doctorate level?'}, {'text': 'Why do people dislike guns?'}, {'text': 'Is IIIT, Hyderabad better for taking an MTech?'}, {'text': 'What does drowning feel like?'}, {'text': 'Why do people laugh at other people

 38%|███▊      | 15/40 [00:15<00:26,  1.07s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Why do only some people see ghosts?'}, {'text': 'How can I help a sick baby get some sleep?'}, {'text': 'How can I sync Microsoft Outlook (mail and calendar) with Ubuntu Evolution?'}, {'text': 'What are your favourite foods?'}, {'text': 'Which browser is best to open Facebook?'}, {'text': 'What is the best weight loss story?'}, {'text': 'Why is Facebook so popular? Is it profitable to users?'}, {'text': 'Will win the 2016 presidential race?'}, {'text': 'How can I be promoted?'}, {'text': 'What is universal declaration of human rights?'}, {'text': 'What is the best place to sell my many books that are fiction?'}, {'text': 'Will post offices get mad if hundreds of packages get shipped to one PO box?'}, {'text': 'How do I learn to write grammatically correct english?'}, {'text': 'There are N teams in a cricket match. How many matches have to be played to know the winner and the runner-up?'}, {'text': 'Where can I get statistics on demographics

 40%|████      | 16/40 [00:16<00:25,  1.05s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How many times can a women have sex in one day?'}, {'text': 'What rovers are still active on mars?'}, {'text': "What do you think of Supreme Court's decision of playing National Anthem in all cinemas?"}, {'text': 'What are some silliest questions asked on Quora?'}, {'text': 'Where can I find market size data for music on hold devices?'}, {'text': 'What is irony?'}, {'text': 'Will someone know if I check their WhatsApp last seen status often?'}, {'text': 'Why sugar is bad for body?'}, {'text': 'How can voltage be reduced using a resistor?'}, {'text': 'What is a spud barges and what are they used for?'}, {'text': 'How do I stop addiction to porn?'}, {'text': 'Why I asked question?'}, {'text': 'Which is better for an MS in CS, USC or University of Colorado Boulder?'}, {'text': 'How much do YouTubers make when each of their videos get 50k, 100k, 500k, 1m, and 1.5m views?'}, {'text': 'How do I become less shy and more socially confident?'}, {'te

 42%|████▎     | 17/40 [00:17<00:23,  1.04s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How can you describe life?'}, {'text': 'Why is my puppy whining for no reason?'}, {'text': 'What are the ill effects of demonetization of 500 and 1000 rupee notes in India?'}, {'text': 'How did you feel after permanently deleting Facebook account?'}, {'text': 'Which is best among this two, Jbl t250si vs boat rockerz 400?'}, {'text': 'Is there any existing source code for Android video filtering like the snapchat?'}, {'text': 'How do 13-year-olds lose belly fat?'}, {'text': 'Is College Works Painting a scam? Why or why not?'}, {'text': 'What are some good internet connection options in Bhiwani?'}, {'text': 'What are some ideas of a new business with low investment to start in India?'}, {'text': 'How much does it cost to develop a Mobile App?'}, {'text': 'What is a architecture engineering?'}, {'text': 'What universities does Express recruit new grads from? What majors are they looking for?'}, {'text': 'Which are the best Bollywood movies in 

 45%|████▌     | 18/40 [00:18<00:22,  1.04s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Is there a biological basis for different races?'}, {'text': 'What is the difference between "may" and "might"?'}, {'text': "What does it feel like to have a mother or father who doesn't love you?"}, {'text': 'The Sindhi caste is under which caste system in India: general, ST, or another?'}, {'text': 'What are your best tips to break bad habits?'}, {'text': 'How can I transfer money from HDFC account in India to Maduro Curiels Bank in Curacao using SWIFT?'}, {'text': 'Is daily sex good for health or not? What is the ideal frequency?'}, {'text': 'What is the best biography of Picasso?'}, {'text': 'How do I find whether my girl friend is cheating on me?'}, {'text': 'What is the best way to spend a weekend in Bangalore?'}, {'text': 'Part time MBA from MGSM vs Executive MBA from AGSM .Which one would you recommend?'}, {'text': 'Who is behind emotely.com?'}, {'text': 'What are some ways to track a switched off mobile after it is lost?'}, {'text'

 48%|████▊     | 19/40 [00:19<00:22,  1.08s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How does human hair not grow infinitely?'}, {'text': 'Have you ever been betrayed? What did you do?'}, {'text': 'Why did people and also scientist believe that the Earth is flat? How?'}, {'text': 'What plant has the smallest seeds?'}, {'text': 'What small things immediately brighten your day?'}, {'text': 'What is the difference between argument and opinion?'}, {'text': "My boyfriend doesn't text me first. I always have to text him first, and after I text him first, he often doesn't text me back. Why?"}, {'text': 'What is your review of Chetan Bhagat Books English?'}, {'text': 'How do I convince my ex to date me again?'}, {'text': 'Are all things you can see in your living room produced in factories numourously?'}, {'text': 'What are the changes you want in Quora?'}, {'text': 'What is the most evil thing you have witnessed in your life?'}, {'text': 'What is the best exercise for lowering cholesterol?'}, {'text': 'Can I get a job at Google af

 50%|█████     | 20/40 [00:20<00:21,  1.05s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Why do I always feel lonely despite having so many friends and family around me?'}, {'text': 'How much does it cost to treat lung cancer?'}, {'text': 'How do I calculate the area of R/ wall raft?'}, {'text': 'Should I join a basic job of 4.5 lpa in it mnc or go for mtech in iit or psu? (strictly in terms of growth in salary)'}, {'text': 'Who is the best at line work tattoos in London?'}, {'text': 'How reliable is Wikipedia as a source of information, and why?'}, {'text': 'Why is Pepsi Max so addictive?'}, {'text': 'What is keyword?'}, {'text': 'Which comes out first the hen or the egg?'}, {'text': 'Which phone is better than iPhone 6?'}, {'text': 'How do I increase my swimming endurance?'}, {'text': 'Where can I learn to play bass guitar in Pune?'}, {'text': "I don't get it. When I am quiet, people want to talk to me. When I am not, I am ignored. What should I do?"}, {'text': 'What are some of your favorite memes?'}, {'text': 'World of Warc

 52%|█████▎    | 21/40 [00:21<00:19,  1.04s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How do I find connected wifi password in mobile?'}, {'text': 'Is Shifu Shaurya Bharadwaj a serving or a retired army officer? Which division or troops he is or was associated with and what was the last rank?'}, {'text': 'How is science and Sanatana Dharma related?'}, {'text': 'How can I get a OBC NCL Certificate?'}, {'text': 'How should I prepare for a business analyst interview at Google?'}, {'text': 'Why did god created mosquitoes?'}, {'text': 'Who do you think will win the 2016 Presidential Election?'}, {'text': "What are some mind blowing iPhone gadgets that most people don't know 2016?"}, {'text': 'Who are the main competitors to Woot?'}, {'text': "What's the most erotic novel you've ever read?"}, {'text': 'What knowledge is required to contribute to Github projects?'}, {'text': 'What is the best strategy to move logs across data centers? Is the best option the combination Flume/Kafka, Flume/Flume, Kafka/MirrorMaker or something else?'

 55%|█████▌    | 22/40 [00:23<00:21,  1.21s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': "My husband is charged with drug charges. The drugs he is charged with were in a locked safe in someone else's house. Does anyone know about warrants?"}, {'text': 'How do I host static content using CloudFlare?'}, {'text': "Has anyone got laid in India using Tinder? What's your story?"}, {'text': 'Sarcasm: What are some witty lines that can be used in day to day conversations?'}, {'text': 'How do you get started investing in stocks?'}, {'text': "Will Hillary Clinton pardon herself if she's indicted for a crime?"}, {'text': 'Why are many public schools still using Windows XP?'}, {'text': 'Which hospital in Kolkata provides best treatment for gallstone?'}, {'text': 'What are the best IPTVs to watch Indian channels in Canada?'}, {'text': "How can I view my wife's text messages?"}, {'text': 'How can one attend the 7th national marketing conclave of KIIT school of management ?'}, {'text': 'How do you know if a Scorpio man really likes you or hate

 57%|█████▊    | 23/40 [00:24<00:19,  1.12s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What is the top speed of the Kawasaki KLX250 supermoto model 2007?'}, {'text': 'What is the best way to get along with people'}, {'text': "How do you read someone's mind?"}, {'text': 'What is the best 3D printer to buy in India?'}, {'text': 'How popular is Bitcoin?'}, {'text': 'What do you do to help with depersonalization?'}, {'text': 'What might happen in the U.S. if Hillary Clinton wins the presidential election?'}, {'text': "My wife wants to be separated from my parents but I don't, what to do?"}, {'text': 'Is India the most advanced in technology among developing country?'}, {'text': 'How painful is chemotherapy for the patient undergoing it?'}, {'text': 'If you accidentally hit someone with your car and kill them at night, what are the legal consequences?'}, {'text': 'What is your Top 10 books of all times?'}, {'text': 'Why is rape a crime? Why is anything a crime?'}, {'text': 'As a believer in the evolutionary process, do you believe

 60%|██████    | 24/40 [00:27<00:27,  1.71s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': "How big of a deal is Paul Manafort's resignation as Trump’s campaign chairman?"}, {'text': 'How much does the Clippers pay to rent the Staples Center per year and per game?'}, {'text': 'Do anyone have conversion chart of organic chemistry for iit jee?'}, {'text': "What do you think is George R. R. Martin's greatest weakness as a writer?"}, {'text': 'What is it like to not have kids?'}, {'text': "What was the weirdest compliment you've ever received?"}, {'text': 'What is the best picture you have ever seen in your life?'}, {'text': 'What is the difference between a religion and a cult?'}, {'text': 'What happens when my C++ program uses more memory than available in the RAM? Does it use page filing to load/store, or does the program crash?'}, {'text': 'What is the difference between viscose and polyester fabrics?'}, {'text': 'What are the most commonly used controls/plugins/third party framework for mobile friendly .net application?'}, {'text

 62%|██████▎   | 25/40 [00:28<00:23,  1.54s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What are the benefits of 8GB RAM over 4GB RAM?'}, {'text': 'Where can I buy a cheap tent in Sydney?'}, {'text': "What's the best online community for B2B marketers?"}, {'text': 'Which is the best joke you have ever heard?'}, {'text': 'Why does the little amount of THC in marijuana smoke cause massive dehydration?'}, {'text': 'How do I hack an Instagram account?'}, {'text': 'Which is the best way of living life?'}, {'text': 'What are some of the best restaurants in America?'}, {'text': "What pattern are you repeating in your life that you'd like to break?"}, {'text': 'How important should sex be in a relationship?'}, {'text': 'Can you find someone by their phone number?'}, {'text': 'How can we earn some money in online?'}, {'text': 'Is there a professional video editor for iPhone?'}, {'text': 'What are accounting software modules?'}, {'text': 'I had a dream about my first Crush and in the dream, he asked me if I wanted to be with him. What d

 65%|██████▌   | 26/40 [00:29<00:20,  1.49s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What will be the effect of banning 500 and 1000 Rs notes on real estate sector in India? Can we expect sharp fall in prices in short/long term?'}, {'text': 'What are the health benefits of an enema?'}, {'text': 'What is an example of an idea you first thought was ridiculous but, upon later reflection, came to believe might actually be true?'}, {'text': 'How much U.S. postage is needed to send a greeting card to Australia, letter sized, from the US?'}, {'text': 'Should I go for iPhone 7 or iPhone 7 plus?'}, {'text': 'Suppose in a given collection of 2016 integers, the sum of any 1008 integers is positive.Show that sum of all 2016 integers is positive?'}, {'text': 'What are some features Facebook is missing?'}, {'text': 'What is a good table tennis bat which under 1500 rupees?'}, {'text': 'What is Leonardo da Vinci background?'}, {'text': 'What is the cost of an average call center call in Malaysia?'}, {'text': 'In a job interview, how do I e

 68%|██████▊   | 27/40 [00:31<00:18,  1.45s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How do you love yourself when nobody loves you?'}, {'text': 'When you meet someone, what are qualities they tend to have that make you like them?'}, {'text': "What's the best book you've never read, and never will?"}, {'text': "What are some 'must-learn' math algorithms to do well in competitive programming?"}, {'text': 'How do I prepare for BPSC?'}, {'text': 'How do I switch from a STATIC IP to a DHCP IP?'}, {'text': 'I love my cousin. can I marry her?'}, {'text': "Why can't we stop corruption?"}, {'text': 'What are the most common traffic convictions in Arkansas, and how does the severity of the convictions differ in Indiana?'}, {'text': 'If a non-vegetarian guy asks a pure vegetarian guy that they are also killing plants to eat, "so what\'s wrong with killing animals to eat?", then what\'s the best answer to reply?'}, {'text': 'During a Presidential debate, if the candidates wanted to keep debating beyond the allotted hour, would the new

 70%|███████   | 28/40 [00:32<00:16,  1.41s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How did Pratibha Patil become the President of India?'}, {'text': 'In ice skating, why are there no same-sex couples?'}, {'text': 'My dog is not able to stand on his hind legs. What can I do to help?'}, {'text': 'Is there any hack or trick available to send messages to a guy, who has blocked me on WhatsApp?'}, {'text': 'Which is the worst mobile network service provider in India?'}, {'text': 'How did Donald Trump won the 2016 USA presidential election?'}, {'text': 'What is the chemical equation of perspiration?'}, {'text': 'It is almost certain that extraterrestrial life exists. How is the alien/God discourse likely to play out if/when we get a visit?'}, {'text': 'How much alcohol can one consume when one is pregnant?'}, {'text': 'Are Greek and Sanskrit related?'}, {'text': 'Why/how can some animals sleep less?'}, {'text': 'Why are people so patriotic?'}, {'text': 'How do you find the value of [math]\\sqrt{3 + \\sqrt{3 - \\sqrt{3 + …}}}[/ma

 72%|███████▎  | 29/40 [00:33<00:14,  1.32s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Have you ever had a dream that became true?'}, {'text': 'Is vaseline a good moisturizer? Why or why not?'}, {'text': 'Do you need to wash your hair with shampoo after putting eggs and yogurt?'}, {'text': 'How secure is KeePassdroid?'}, {'text': 'Why do so many Indians hate the AAP and Arvind Kejriwal?'}, {'text': 'Is this a good book for learning Java: "Java the Complete Reference" by Herbert Shildt?'}, {'text': "Whiskey: How many 'shots' are in a fifth of Jack Daniel's?"}, {'text': 'How do I get a sim card?'}, {'text': "How does Amazon's wish list work?"}, {'text': 'Where do I catch a Doduo in Pokémon GO?'}, {'text': 'What are some tips on making it through the job interview process at Banco Popular?'}, {'text': 'What universities does FedEx recruit new grads from? What majors are they looking for?'}, {'text': 'What is resonant frequency?'}, {'text': 'How would the United States and the world be affected if Donald Trump becomes the preside

 75%|███████▌  | 30/40 [00:35<00:13,  1.33s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What would you call someone who gets very excited about things at the beginning but loses interest quickly?'}, {'text': "How Darwin's theory of evolution is right?"}, {'text': 'Does Ne-Yo have a brother?'}, {'text': 'What is the GATE exam in engineering?'}, {'text': 'How do you zip a file on Mac?'}, {'text': 'How do you fix an air conditioner that is not cooling?'}, {'text': 'How do I start a talk with stranger? Specially a girl?'}, {'text': 'How is at skiing Vail, CO?'}, {'text': 'What are the coolest words in the English language, and what do they mean?'}, {'text': 'How widely accepted are credit cards at small businesses and restaurants in Israel?'}, {'text': "Where's the best place to buy lighter fluid for zippos?"}, {'text': 'Is drinking Tropicana orange daily good for health?'}, {'text': 'Is it possible to deflect a meteor crash with 2 to 3 hours of notice?'}, {'text': 'How do I forget the person I love the most?'}, {'text': 'How did 

 78%|███████▊  | 31/40 [00:36<00:12,  1.41s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'Is forex trading in India legal?'}, {'text': 'Is it healthy to eat egg whites every day?'}, {'text': 'How do I convince my parents to let my younger brother go for courses other than a BTech/MBBS?'}, {'text': 'What are the differences between Chinese and western cultures?'}, {'text': 'What does New Relic do? And how can I use their service?'}, {'text': 'Is React.js faster than that of JQuery?'}, {'text': 'Who operates Quora’s Twitter account?'}, {'text': 'Why do I get asked so many questions on Quora?'}, {'text': 'What is the most popular recipe in the United States?'}, {'text': 'Why does my female dog Emma hump my leg?'}, {'text': 'What are Tier-1, tier-2, tier-3 ISP?'}, {'text': 'How do I unlock HTC desire 520?'}, {'text': 'I think I hit the bottom of my car on a rough curb. Should I be worried?'}, {'text': 'How can I earn money online, seriously?'}, {'text': 'Will American parenting in the future going to be very overprotective?'}, {'tex

 80%|████████  | 32/40 [00:38<00:11,  1.43s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What inspires you to inspire others?'}, {'text': 'What internet speed is required for 4K cloud gaming at 60 fps?'}, {'text': 'How easy is it to let go sum1 whom u love? I was trying earlier to come out of it but in vain…Yrs pass by… irony is he nvr knew that…'}, {'text': "What's your most interesting experience?"}, {'text': "In the biblical story of Adam and Eve, Eve was born from Adam's rib, what is the meaning of this?"}, {'text': 'How do I mirror a video from iPhone to Windows PC screen in fullscreen mode?'}, {'text': 'Can I watch movies online without giving a credit card number?'}, {'text': "What is a good layman's explanation for the Kullback-Leibler Divergence?"}, {'text': "How is the ISRO interview for the post of scientist/engineer 'SC' for chemical?"}, {'text': 'What are the best cities to live in the US?'}, {'text': 'What is the most widely accepted stochastic volatility model for pricing options and why?'}, {'text': 'What is the

 82%|████████▎ | 33/40 [00:39<00:09,  1.42s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'I am a 12th pass student of science. I am preparing for medical entrances. I also want to become an astronaut. What should I do in the future?'}, {'text': 'Is a range finder camera always better than SLR?'}, {'text': 'How do I make a creative birthday card for a friend?'}, {'text': 'Who will win the U.S.A presedential elections of 2016?'}, {'text': "What are the main reasons why students from universities in the US don't graduate on time (that is, within four years)?"}, {'text': 'Does Gear VR work on Nexus 4?'}, {'text': "What is the coolest thing you've done with a Raspberry Pi?"}, {'text': 'What motivates you the most?'}, {'text': 'Are there any real life megacorporations?'}, {'text': 'How can I prepare for IIT JEE 2018?'}, {'text': 'How much weight can a drone carry?'}, {'text': 'How many blogs can one create on Quora?'}, {'text': 'How should one equip oneself to do a final year project in intelligent character recognition?'}, {'text': '

 85%|████████▌ | 34/40 [00:41<00:09,  1.51s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What are some pet names for a husband?'}, {'text': 'Can I use Quora to promote my website?'}, {'text': 'How is improve my communication?'}, {'text': 'How do I evaluate the integral [math]\\int\\frac{1}{(1+x^3)^3}dx[/math]?'}, {'text': 'What is the difference between implements and extends?'}, {'text': 'What is the best way to increase vocabulary?'}, {'text': 'Why is the US Supreme Court ruling to legalize same-sex marriage more important than letting individual states define marriage law?'}, {'text': "Though people tell me I am pretty, why do I sometimes feel not pretty or I don't like what I see in the mirror?"}, {'text': 'Why are deep neural networks so bad with sparse data?'}, {'text': 'How do you fix the SSL Connection Error on Google Chrome?'}, {'text': 'Do most Israeli favor settlement expansion in West Bank?'}, {'text': 'Were/are Donald Trump and Hillary Clinton friends?'}, {'text': 'Does schizophrenia have anything to do with the di

 88%|████████▊ | 35/40 [00:42<00:07,  1.51s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What type of address proof is required for passport police verification?'}, {'text': 'What is it like to work for Vice President Joe Biden?'}, {'text': 'Is ammonia covalent or ionic? Why?'}, {'text': 'In which culture(s) is it considered feminine for a man to sit crossed leg in chairs?'}, {'text': 'What are the proofs that aliens really exist?'}, {'text': 'Where can I get affordable party photo booth services in Sydney?'}, {'text': 'Why do cats love catnip?'}, {'text': 'What is influencer marketing?'}, {'text': "A friend's WhatsApp account is hacked by MAC address spoofing. Even though the chats are stored locally, will it visible to the hacker? And How can it be recovered?"}, {'text': 'What is better, East Coast or West Coast for travelling?'}, {'text': 'What is company?'}, {'text': 'Where can I sell my idea?'}, {'text': 'Why is Scotland holding a referendum?'}, {'text': 'What am I supposed to do when someone pokes me on Facebook?'}, {'tex

 90%|█████████ | 36/40 [00:43<00:05,  1.41s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'In how much time does a normal passport come after police verification?'}, {'text': 'Were minimum wages ever intended to be a living wage?'}, {'text': 'Is it dangerous to be in a relationship with a psychopath?'}, {'text': 'What are some foods that begin with the letter v?'}, {'text': 'How does surfcanyon.com work?'}, {'text': 'What is the best food to eat on an empty stomach?'}, {'text': 'Does anyone with 98+ percentile in cat go for an IIT MBA?'}, {'text': 'Which sports team is better, Manchester United or Liverpool?'}, {'text': 'Where do I find my vehicle registration number? Is it the same as my plate number?'}, {'text': 'What are some of the best responses to "Sell me this pen/pencil" in a job interview?'}, {'text': 'What are some of the best answers on Quora?'}, {'text': 'What are the most violated human rights in India?'}, {'text': 'What if a producer overpopulates in an ecosystem?'}, {'text': 'Where can I find custom made wedding ca

 92%|█████████▎| 37/40 [00:45<00:04,  1.39s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What is the Sahara, and how do the average temperatures there compare to the ones in the Simpson Desert?'}, {'text': 'How do I organize files for big project?'}, {'text': 'What are some habits that smart and highly intelligent people have?'}, {'text': 'What is the best book to know more about lord Shiva?'}, {'text': 'What is an example of an electrolyte?'}, {'text': 'How do I treat depression without medication?'}, {'text': 'Where can I find creative writers?'}, {'text': 'What is mean by capability curve?'}, {'text': 'How do I find corporate trainer in Bangalore?'}, {'text': 'What factors are stifling the startup entrepreneurship ecosystem in Vietnam?'}, {'text': 'Which car should I buy within 10 lakhs?'}, {'text': 'What do the amount of X mean on a text message?'}, {'text': 'How would an average university graduate best survive prison?'}, {'text': 'Where do Christians stand on the philosophical dilemma of the Euthyphro question? Why?'}, {'

 95%|█████████▌| 38/40 [00:46<00:02,  1.28s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'How do I create a middle finger symbol with text?'}, {'text': 'What are the types of Rationality offered by Max Weber?'}, {'text': 'How do you reverse a linked list?'}, {'text': 'How was education during the Japanese occupation like in Singapore?'}, {'text': "In India, what are the things that women can do easily, but men can't?"}, {'text': 'How does spilled water damage a laptop?'}, {'text': 'What is your most embarrassing injury?'}, {'text': 'If I forgot my Facebook password, how do I reset it?'}, {'text': 'How can sell my land online?'}, {'text': "What's the difference between performance and efficiency?"}, {'text': 'What is an example of the word "dour" in a sentence?'}, {'text': 'Can I know the basics of hacking?'}, {'text': 'What statistical procedure would be used to determine black money after demonitisation?'}, {'text': 'What are some of the best places to visit in your country?'}, {'text': 'I have a client in the US who is ready t

 98%|█████████▊| 39/40 [00:47<00:01,  1.23s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What are the distinct behavioral differences between an introvert and a schizoid?'}, {'text': 'What is global solidarity?'}, {'text': 'What can you change in your life to become an everyday hero?'}, {'text': 'How do I recover a lost password or account for gmail?'}, {'text': 'Which other countries did demonetization?'}, {'text': 'How can you study economics on Quora?'}, {'text': 'Is sex painful during pregnancy?'}, {'text': 'Why do some people believe that the world is flat?'}, {'text': 'How can I get back my original skin colour? Its tanned.'}, {'text': 'Is it ok if a IT professional works on a freelancing work?'}, {'text': 'How do I prepare for the NMAT in 10 days?'}, {'text': "How long would it take to load a 1GB file into RAM if the speed of the drive the file was stored on didn't matter?"}, {'text': 'Can $B creative breakthroughs like Uber happen on demand?'}, {'text': 'What are the problems after upgrading to Windows 10?'}, {'text': '

100%|██████████| 40/40 [00:48<00:00,  1.21s/it]

250  -  -  -  -  -  -  -  -  -  -  [{'text': 'What are some major technical breakthroughs in recent video games?'}, {'text': 'How can I get 100 likes on my Facebook photo?'}, {'text': 'What is the difference between Sin and Crime?'}, {'text': 'What are the sites to download TV series?'}, {'text': 'Will the 2016 US presidential election be close?'}, {'text': 'What is the best SEO company in Delhi, India?'}, {'text': "Moto g4 screen is all black lights light up but can't see anything reset tricks do not work?"}, {'text': 'How can I publish theses on agriculture on the International Library for Thesis?'}, {'text': 'What is best i5 6th or 7th gen?'}, {'text': 'What is the way to overcome the stage fear?'}, {'text': 'What is your horror story at NIT?'}, {'text': "What's the best way to extend battery life on my Android phone?"}, {'text': 'Hypothetical scenario: the US military is forced to choose between giving up all its Apache Gunships or all its Black Hawks. Which do they choose?'}, {'te




insert/upsert to pincode with embeddings and metadata

In [28]:
for i in tqdm(range(0, len(questions), batch)):
    i_end = min(i+batch, len(questions))
    ids = [str(x) for x in range(i, i_end)]
    metadatas = [{'text': text} for text in questions[i:i_end]]
    xc = model.encode(questions[i:i_end])
    records = zip(ids, xc, metadatas)
    index.upsert(vectors=records)

100%|██████████| 40/40 [01:06<00:00,  1.66s/it]


In [29]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 10000}},
 'total_vector_count': 10000}

Below code:

index.query(...): This function call performs the actual search. The index is a vector search index, potentially managed by a library such as Pinecone, Faiss, or Annoy.

top_k=10: The search query returns the top 10 results that are most similar to the provided vector.
vector=embedding: This specifies the query vector.

include_metadata=True: Indicates that the query should return metadata associated with each result. Metadata often includes original text or additional information useful for identifying or understanding the results.

include_values=False: Specifies that the actual vector values of the results should not be included in the response. This is useful for reducing response size if only metadata is required.
    
    > What Does include_values=False Do?
        When you perform a vector search query, the search engine can return various pieces of information about each of the matches found. These pieces of information can include:

            Metadata: This is data associated with each item in the vector database, such as descriptions, titles, or any other contextual information that was stored with the vector. Metadata helps in identifying or understanding what each vector represents in human-readable form.

            Vector Values: These are the actual high-dimensional data points (numerical values) of each vector. Vectors are the core component of how items are stored and searched within the system. They represent the item in a mathematical space.

In [38]:
# small helper function so we can repeat queries later
def query(query):
  embedding = model.encode(query).tolist()
  results = index.query(top_k=10, vector=embedding, include_metadata=True, include_values=False)
  for result in results['matches']:
    print(f"{round(result['score'], 2)}: {result['metadata']['text']}")

In [39]:
query('AM I BATMAN!!!!')

0.51: Who is a better superhero: Batman or Spider-Man?
0.5: Why does Joker in Batman so famous?
0.45: Who would win a fight between wolverine and Batman?
0.45: What fighting style does Batman use in Batman vs Superman?
0.44: How should I prepare myself to be a real-life Joker as in The Dark Knight?
0.43: Is the Joker in the Suicide Squad comics?
0.4: If Batman had gone to Hogwarts, which house would he be in?
0.37: What is the full form of DC in DC comics?
0.37: The Dark Knight Rises (2012 movie): Why didn't Batman kill Bane in this scene when he had a clear shot at him with "the bat"?
0.36: Why does The Joker burn the money in The Dark Knight?
