https://www.ncbi.nlm.nih.gov/dbvar/content/tools/entrez/

You should test that your search return results first on the web 
https://www.ncbi.nlm.nih.gov/dbvar before using them 
in your python script.  Available dbVar search terms are on the help page 
(https://www.ncbi.nlm.nih.gov/dbvar/content/help/#entrezsearch).
For general Entrez help and boolean search see the online book
(https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options)

This example will make use of these eUtils History Server parameters
usehistory, WebEnv, and query_key.  It is highly recommended you use them in
your pipeline and script.

/usehistory=/
When usehistory is set to 'y', ESearch will post the UIDs resulting from the
search operation onto the History server so that they can be used directly in
a subsequent E-utility call. Also, usehistory must be set to 'y' for ESearch
to interpret query key values included in term or to accept a WebEnv as input.

/WebEnv=/
Web environment string returned from a previous ESearch, EPost or ELink call.
When provided, ESearch will post the results of the search operation to this
pre-existing WebEnv, thereby appending the results to the existing
environment. In addition, providing WebEnv allows query keys to be used in
term so that previous search sets can be combined or limited. As described
above, if WebEnv is used, usehistory must be set to 'y' (ie.
esearch.fcgi?db=dbvar&term=asthma&WebEnv=<webenv string>&usehistory=y)

/query_key=/
Integer query key returned by a previous ESearch, EPost or ELink call. When
provided, ESearch will find the intersection of the set specified by query_key
and the set retrieved by the query in term (i.e. joins the two with AND). For
query_key to function, WebEnv must be assigned an existing WebEnv string and
usehistory must be set to 'y'.

load python modules
May require one time install of biopython and xml2dict.

In [22]:
from Bio import Entrez
import xmltodict
import Bio
print(Bio.__version__)

1.74


In [23]:
# initialize some default parameters
Entrez.email = 'myemail@ncbi.nlm.nih.gov' # provide your email address
db = 'GDS'                              # set search to dbVar database
paramEutils = { 'usehistory':'Y' }        # Use Entrez search history to cache results


https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2770

In [75]:
# generate query to Entrez eSearch
#eSearch = Entrez.esearch(db=db, term='(GSE2770[All Fields])', **paramEutils)

eSearch = Entrez.esearch(db='geoprofiles', term='(GSM60699[Gene Symbol])', **paramEutils)

# get eSearch result as dict object
res = Entrez.read(eSearch)

# take a peek of what's in the result (ie. WebEnv, Count, etc.)
for k in res:
    print (k, "=",  res[k])

paramEutils['WebEnv'] = res['WebEnv']         #add WebEnv and query_key to eUtils parameters to request esummary using  
paramEutils['query_key'] = res['QueryKey']    #search history (cache results) instead of using IdList 
paramEutils['rettype'] = 'xml'                #get report as xml
paramEutils['retstart'] = 0                   #get result starting at 0, top of IdList
paramEutils['retmax'] = 5                     #get next five results

# generate request to Entrez eSummary
result = Entrez.esummary(db=db, **paramEutils)
# get xml result
xml = result.read()
# take a peek at xml
print(xml)

Count = 3
RetMax = 3
RetStart = 0
QueryKey = 3
WebEnv = NCID_1_252301933_130.14.22.76_9001_1564695854_1949901676_0MetA0_S_MegaStore
IdList = ['200002770', '100000096', '300060699']
TranslationSet = []
TranslationStack = ['GROUP', {'Term': 'GSM60699[All Fields]', 'Field': 'All Fields', 'Count': '3', 'Explode': 'N'}, 'AND']
QueryTranslation = (#2) AND GSM60699[All Fields]
ErrorList = {'FieldNotFound': ['Gene Symbol'], 'PhraseNotFound': []}
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">
<eSummaryResult>
<DocSum>
	<Id>200002770</Id>
	<Item Name="Accession" Type="String">GSE2770</Item>
	<Item Name="GDS" Type="String">1290</Item>
	<Item Name="title" Type="String">Transcriptional profiles of Th cells induced to polarize to Th1 or Th2 direction in the presence or absence of TGFbeta</Item>
	<Item Name="summary" Type="String">Th1 and Th2 cells arise from a common

In [28]:
#convert xml to python dict object for convenient parsing
dsdocs = xmltodict.parse(xml)

#get set of dbVar DocumentSummary (dsdocs) and print report for each (ds)

for ds in dsdocs ['eSummaryResult']['DocSum']['id']: 
    #print(ds)
    for p in ds['dbVarPlacementList']['dbVarPlacement']: 
        print (ds['@uid'], ds['ST'], ds['SV'],p['Chr'], p['Chr_start'], p['Chr_end'], p['Chr_inner_start'], p['Chr_inner_end'])


TypeError: list indices must be integers or slices, not str

In [66]:
dsdocs['eSummaryResult']['DocSum'][0]['Item']

[OrderedDict([('@Name', 'Accession'),
              ('@Type', 'String'),
              ('#text', 'GSE2770')]),
 OrderedDict([('@Name', 'GDS'), ('@Type', 'String'), ('#text', '1290')]),
 OrderedDict([('@Name', 'title'),
              ('@Type', 'String'),
              ('#text',
               'Transcriptional profiles of Th cells induced to polarize to Th1 or Th2 direction in the presence or absence of TGFbeta')]),
 OrderedDict([('@Name', 'summary'),
              ('@Type', 'String'),
              ('#text',
               'Th1 and Th2 cells arise from a common precursor cell in response to triggering through the TCR and cytokine receptors for IL-12 or IL-4. This leads to activation of complex signaling pathways, which are not known in detail. Disturbances in the balance between type 1 and type 2 responses can lead to certain immune-mediated diseases. Thus, it is important to understand how Th1 and Th2 cells are generated. To clarify the mechanisms as to how IL-12 and IL-4 induce Th1 an

In [73]:
group = 0
genes = []
for item in dsdocs['eSummaryResult']['DocSum'][group]['Item']:
    #print(item['@Name'])
    if item['@Name']=='Samples' and 'Item' in item.keys():
        for ite in item['Item']:
            if 'Item' in ite.keys():
                print('{}, {}'.format(ite['Item'][0]['#text'],ite['Item'][1]['#text']))
                genes.append(ite['Item'][0]['#text'])

GSM60374, Th cells_antiCD3+antiCD28+IL4+TGFbeta_6h_replicate 2a (U95Av2)
GSM60729, Th cells_antiCD3+antiCD28+IL12+TGFbeta_2h_replicate 2a (U133A)
GSM60360, Th cells_antiCD3+antiCD28+IL12_48h_replicate 1b (U95Av2)
GSM60752, Th cells_antiCD3+antiCD28+IL12_48h_replicate 2b (U133B)
GSM60354, Th cells_antiCD3+antiCD28+IL12_6h_replicate 1a (U95Av2)
GSM60377, Th cells_antiCD3+antiCD28+IL12_6h_replicate 2a (U95Av2)
GSM60709, Th cells_antiCD3+antiCD28+IL4_6h_replicate 1a (U133A)
GSM60755, Th cells_antiCD3+antiCD28+IL4+TGFbeta_48h_replicate 2b (U133B)
GSM60732, Th cells_antiCD3+antiCD28+IL12_2h_replicate 2a (U133A)
GSM60357, Th cells_antiCD3+antiCD28+IL4_2h_replicate 1a (U95Av2)
GSM60749, Th cells_antiCD3+antiCD28_48h_replicate 1b (U133B)
GSM60712, Th cells_antiCD3+antiCD28+IL12+TGFbeta_48h_replicate 1b (U133A)
GSM60380, Th cells_antiCD3+antiCD28+IL4+TGFbeta_2h_replicate 2a (U95Av2)
GSM60701, Th cells_antiCD3+antiCD28+IL4+TGFbeta_6h_replicate 1a (U133A)
GSM60715, Th cells_antiCD3+antiCD28_48h_re

In [129]:
from Bio.Affy import CelFile
with open("../data/GSE2770_RAW/GSM60700.CEL.gz", "rb") as handle:
    c = CelFile.read(handle)

c.version

In [127]:
dir(c)

['Algorithm',
 'AlgorithmParameters',
 'DatHeader',
 'GridCornerLL',
 'GridCornerLR',
 'GridCornerUL',
 'GridCornerUR',
 'NumberCells',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'intensities',
 'mask',
 'modified',
 'ncols',
 'nmask',
 'noutliers',
 'npix',
 'nrows',
 'outliers',
 'stdevs',
 'version']

In [130]:
print(c.NumberCells)
# handle

None


In [18]:
from Bio import Entrez
from Bio import SeqIO

Entrez.email = "sample@example.org"

handle = Entrez.efetch(db="nuccore",
                       id="CP001665",
                       rettype="gb",
                       retmode="text")

whole_sequence = SeqIO.read(handle, "genbank")

print(whole_sequence[6373:6422])

ID: CP001665.1
Name: CP001665
Description: Escherichia coli 'BL21-Gold(DE3)pLysS AG', complete genome
Number of features: 0
Seq('GCGCTAACCATGCGAGCGTGCCTGATGCGCTACGCTTATCAGGCCTACG', IUPACAmbiguousDNA())


In [77]:
import GEOparse
gse = GEOparse.get_GEO(geo="GSE2770", destdir="./")

01-Aug-2019 17:51:38 DEBUG utils - Directory ./ already exists. Skipping.
01-Aug-2019 17:51:38 INFO GEOparse - Downloading ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2770/soft/GSE2770_family.soft.gz to ./GSE2770_family.soft.gz
01-Aug-2019 17:51:38 INFO utils - Downloading ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2770/soft/GSE2770_family.soft.gz to ./GSE2770_family.soft.gz


D: 100% - 26.1MiB  / 26.1MiB  eta 0:00:00


01-Aug-2019 17:51:54 INFO GEOparse - Parsing ./GSE2770_family.soft.gz: 
01-Aug-2019 17:51:54 DEBUG GEOparse - DATABASE: GeoMiame
01-Aug-2019 17:51:54 DEBUG GEOparse - SERIES: GSE2770
01-Aug-2019 17:51:54 DEBUG GEOparse - PLATFORM: GPL96
01-Aug-2019 17:51:58 DEBUG GEOparse - PLATFORM: GPL97
01-Aug-2019 17:52:00 DEBUG GEOparse - PLATFORM: GPL8300
01-Aug-2019 17:52:02 DEBUG GEOparse - SAMPLE: GSM60348
01-Aug-2019 17:52:02 DEBUG GEOparse - SAMPLE: GSM60349
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60350
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60351
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60352
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60353
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60354
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60355
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60356
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60357
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GSM60358
01-Aug-2019 17:52:03 DEBUG GEOparse - SAMPLE: GS

In [85]:
#gse.gpls['GPL8300'].columns
gse.gsms["GSM60699"].columns

Unnamed: 0,description
ID_REF,Affymetrix U133A probe ID
VALUE,MAS5.0 calculated normalized signal intensity ...
ABS_CALL,
DETECTION P-VALUE,"'detection p-value', p-value that indicates th..."


In [93]:
gse.gsms["GSM60348"].columns

Unnamed: 0,description
ID_REF,Affymetrix U95Av2 probe ID
VALUE,MAS5.0 calculated normalized signal intensity ...
ABS_CALL,
DETECTION P-VALUE,"'detection p-value', p-value that indicates th..."


In [94]:
gse.gsms["GSM60733"].columns

Unnamed: 0,description
ID_REF,Affymetrix U133B probe ID
VALUE,MAS5.0 calculated normalized signal intensity ...
ABS_CALL,
DETECTION P-VALUE,"'detection p-value', p-value that indicates th..."


In [89]:
pivoted_control_samples = gse.pivot_samples('VALUE')[controls]
pivoted_control_samples.head()

name,GSM60348,GSM60699,GSM60733
ID_REF,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1000_at,118.5,,
1001_at,8.9,,
1002_f_at,1.6,,
1003_s_at,4.1,,
1004_at,20.6,,


In [88]:
controls = ['GSM60348',
            'GSM60699',
            'GSM60733']

In [90]:
gsm = GEOparse.get_GEO(geo="GSM60699", destdir="./")

02-Aug-2019 12:11:40 DEBUG utils - Directory ./ already exists. Skipping.
02-Aug-2019 12:11:40 INFO GEOparse - Downloading http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GSM60699&form=text&view=full to ./GSM60699.txt
02-Aug-2019 12:11:40 INFO utils - Downloading http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GSM60699&form=text&view=full to ./GSM60699.txt


Downloading: 597.1KiB / Unknown - 0.0KiB/s      

02-Aug-2019 12:11:41 INFO GEOparse - Parsing ./GSM60699.txt: 





In [107]:
# dir(gse)
dir(gse.gpls['GPL96'])

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_get_columns_as_string',
 '_get_metadata_as_string',
 '_get_object_as_soft',
 '_get_table_as_string',
 'columns',
 'database',
 'geotype',
 'get_accession',
 'get_metadata_attribute',
 'get_type',
 'gses',
 'gsms',
 'head',
 'metadata',
 'name',
 'relations',
 'show_columns',
 'show_metadata',
 'show_table',
 'table',
 'to_soft']

In [117]:
# dir(gse.gpls['GPL96'])
# gse.gpls['GPL96'].columns
gse.gpls['GPL96'].head

<bound method SimpleGEO.head of <PLATFORM: GPL96>>